Self-hosted Document Management with Paperless-ngx
I spent an entire day deploying Paperless-ngx on my home NAS for document digitization. Five pitfalls, each costing me 1-2 hours. Here's everything I learned so you can get it right the first time.
What is Paperless-ngx
Paperless-ngx is an open-source document management frontend built on Django and PostgreSQL. Key features:
- OCR scanning (Tesseract)
- Auto tagging/correspondent/document type classification
- Full-text search
- Multiple import methods: consume folder, IMAP, Webhook
- Official demo: demo.paperless-ngx.com (login: demo/demo)
Official deployment is via Docker Compose. GitHub: 40,032 ⭐ as of 2026-04-29, making it the most active project in the self-hosted document management space.
Pitfall 1: Default SQLite Locks Up With Large Archives
**Symptom:** After adding ~2000 scanned pages, the Web UI starts throwing 500 errors frequently, and logs show database is locked.
Root cause: Paperless-ngx defaults to SQLite. When concurrent writes happen (Consumer importing + Web queries simultaneously), SQLite's write-lock mechanism creates contention. This problem worsens significantly at archive sizes over 5000 pages.
**Solution:** Switch to PostgreSQL. Add to docker-compose.yml environment section:
PAPERLESS_DBENGINE: postgresql
PAPERLESS_DBPASS: your_secure_password_here
Add PostgreSQL service to depends_on:
services:
paperless:
depends_on:
postgres:
condition: service_healthy
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: your_secure_password_here
volumes:
- ./postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless"]
interval: 10s
timeout: 5s
retries: 5
⚠️ Warning: PAPERLESS_DBPASS requires PostgreSQL to be running before Paperless connects successfully. If you start Paperless before Postgres, initialization fails. Recommended sequence:
docker compose up -d postgres
# Wait 30s for health check to pass
docker compose up -d paperless
Pitfall 2: Chinese OCR Recognition Only 30% Accurate
Symptom: After importing Chinese scanned documents, search accuracy is extremely low — most Chinese characters are recognized as garbled text.
**Root cause:** Tesseract defaults to English language packs only. Chinese requires the tesseract-ocr language data package. Paperless-ngx's Docker image doesn't include Chinese OCR data by default.
**Solution:** In docker-compose.yml, specify languages to install:
PAPERLESS_OCR_LANGUAGES: chi-sim eng
For Traditional Chinese, use chi-tra. After installation, restart Paperless — the OCR engine will reprocess all unrecognized documents. Processing speed drops ~40% (Chinese OCR is more complex than English), but accuracy jumps from 30% to 85%+.
To reprocess already-imported documents: Paperless dashboard → Document list → Select target documents → "More Actions" (top right) → "Rerun OCR". For batch processing, enable "Use OCR for older documents" in OCR Settings.
Pitfall 3: Consume Folder Files Pile Up, New Documents Don't Enter System
Symptom: Dropped a PDF into the consume folder, waited 5 minutes with no response, document not visible in Paperless UI.
Debugging steps: First confirm consume folder is correctly mapped to container:
volumes:
- ./consume:/consume
Then check if Consumer is running:
docker compose logs -f paperless | grep -i consume
If you see ERROR: Unsupported file type, the file format isn't supported. Paperless-ngx v2.x supported formats: PDF, images (JPG/PNG/TIFF/GIF/BMP), plain text, Office documents (DOC/DOCX/ODT), email files (EML/MSG). .pages and Apple-proprietary formats require conversion first.
Another common cause: File permissions. If the host consume folder's permissions aren't 1000:1000, Paperless user (UID 1000) inside container has no read access:
chown -R 1000:1000 ./consume
Pitfall 4: Reverse Proxy Returns 403 on All API Requests
Symptom: Through Nginx reverse proxy to Paperless, the main page loads normally, but all API calls (search, upload, tag management) return 403 Forbidden.
**Root cause:** Paperless-ngx uses Django's CSRF protection, which validates Referer and Origin headers by default. When Nginx reverse proxy is misconfigured, Django receives HTTP_HOST as an internal address, causing validation failure.
Correct Nginx config template:
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support required for Paperless real-time notifications
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
The X-Forwarded-Proto and Upgrade headers are critical. Without WebSocket upgrade configuration, Paperless's real-time notifications fail completely — but it won't throw 403 directly. Instead you'll see "upload succeeds but page doesn't refresh."
Pitfall 5: IMAP Mail Attachment Filenames Become Garbled
**Symptom:** Documents sent to Paperless via email have all attachments named ATT0001.pdf (garbled), which can't be OCR-recognized.
**Root cause:** Paperless-ngx's IMAP Consumer uses Python's email library to parse messages. When an email's attachment Content-Disposition header doesn't include a filename parameter (especially common with iOS native Mail app), attachments get assigned default sequential names like ATT0001.
**Solution:** Configure IMAP in docker-compose.yml:
PAPERLESS_CONSUMER_RECURSIVE: "false"
PAPERLESS_EMAIL_TASK_KEEPalive: "false"
PAPERLESS_EMAIL_FROM: "paperless@yourdomain.com"
In Paperless UI → Settings → Mail, configure:
- **Attachment Filename Schema**: `{name}`
- **Filename Style**: Select `Original Filename` (if email attachments have original filenames)
For iOS users, another effective workaround: don't send photos directly as attachments. Instead, save photos to iCloud Drive or Google Drive first, then send the share link to Paperless's email address.
Summary: Paperless-ngx Deployment Checklist
Before deploying Paperless-ngx, confirm these 5 points:
1. Use PostgreSQL — SQLite has performance risks beyond 1000 pages
2. **Install OCR language packs upfront** — Chinese recognition requires chi-sim
3. Consume folder permissions — 755 + UID 1000 is the standard
4. Nginx reverse proxy — must include WebSocket upgrade headers
5. Email attachment naming — iOS users watch out for garbled filenames
Official one-line install script:
bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
Supports Ubuntu 开发环境/Debian/Raspberry Pi OS, automatically handles all dependencies.
For archives under 10,000 pages, Paperless-ngx runs smoothly on a 2-core 4GB machine. Beyond that scale, use PostgreSQL + 8GB RAM combo.
👉 Want AI-powered document classification on your NAS? Learn how MiniMax's multimodal API handles mixed text and image content: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link
🔗 Related Tech Articles
Deep dive into related technical topics: