📚 Related Reading

← Back to Home

Self-hosted Document Management with Paperless-ngx

Paperless-ngxDockerPostgreSQLNASDocument DigitizationMiniMax

I spent an entire day deploying Paperless-ngx on my home NAS for document digitization. Five pitfalls, each costing me 1-2 hours. Here's everything I learned so you can get it right the first time.

What is Paperless-ngx

Paperless-ngx is an open-source document management frontend built on Django and PostgreSQL. Key features:

Official deployment is via Docker Compose. GitHub: 40,032 ⭐ as of 2026-04-29, making it the most active project in the self-hosted document management space.

Pitfall 1: Default SQLite Locks Up With Large Archives

**Symptom:** After adding ~2000 scanned pages, the Web UI starts throwing 500 errors frequently, and logs show database is locked.

Root cause: Paperless-ngx defaults to SQLite. When concurrent writes happen (Consumer importing + Web queries simultaneously), SQLite's write-lock mechanism creates contention. This problem worsens significantly at archive sizes over 5000 pages.

**Solution:** Switch to PostgreSQL. Add to docker-compose.yml environment section:

PAPERLESS_DBENGINE: postgresql
PAPERLESS_DBPASS: your_secure_password_here

Add PostgreSQL service to depends_on:

services:
  paperless:
    depends_on:
      postgres:
        condition: service_healthy
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: your_secure_password_here
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U paperless"]
      interval: 10s
      timeout: 5s
      retries: 5

⚠️ Warning: PAPERLESS_DBPASS requires PostgreSQL to be running before Paperless connects successfully. If you start Paperless before Postgres, initialization fails. Recommended sequence:

docker compose up -d postgres
# Wait 30s for health check to pass
docker compose up -d paperless

Pitfall 2: Chinese OCR Recognition Only 30% Accurate

Symptom: After importing Chinese scanned documents, search accuracy is extremely low — most Chinese characters are recognized as garbled text.

**Root cause:** Tesseract defaults to English language packs only. Chinese requires the tesseract-ocr language data package. Paperless-ngx's Docker image doesn't include Chinese OCR data by default.

**Solution:** In docker-compose.yml, specify languages to install:

PAPERLESS_OCR_LANGUAGES: chi-sim eng

For Traditional Chinese, use chi-tra. After installation, restart Paperless — the OCR engine will reprocess all unrecognized documents. Processing speed drops ~40% (Chinese OCR is more complex than English), but accuracy jumps from 30% to 85%+.

To reprocess already-imported documents: Paperless dashboard → Document list → Select target documents → "More Actions" (top right) → "Rerun OCR". For batch processing, enable "Use OCR for older documents" in OCR Settings.

Pitfall 3: Consume Folder Files Pile Up, New Documents Don't Enter System

Symptom: Dropped a PDF into the consume folder, waited 5 minutes with no response, document not visible in Paperless UI.

Debugging steps: First confirm consume folder is correctly mapped to container:

volumes:
  - ./consume:/consume

Then check if Consumer is running:

docker compose logs -f paperless | grep -i consume

If you see ERROR: Unsupported file type, the file format isn't supported. Paperless-ngx v2.x supported formats: PDF, images (JPG/PNG/TIFF/GIF/BMP), plain text, Office documents (DOC/DOCX/ODT), email files (EML/MSG). .pages and Apple-proprietary formats require conversion first.

Another common cause: File permissions. If the host consume folder's permissions aren't 1000:1000, Paperless user (UID 1000) inside container has no read access:

chown -R 1000:1000 ./consume

Pitfall 4: Reverse Proxy Returns 403 on All API Requests

Symptom: Through Nginx reverse proxy to Paperless, the main page loads normally, but all API calls (search, upload, tag management) return 403 Forbidden.

**Root cause:** Paperless-ngx uses Django's CSRF protection, which validates Referer and Origin headers by default. When Nginx reverse proxy is misconfigured, Django receives HTTP_HOST as an internal address, causing validation failure.

Correct Nginx config template:

location / {
    proxy_pass http://127.0.0.1:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # WebSocket support required for Paperless real-time notifications
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}

The X-Forwarded-Proto and Upgrade headers are critical. Without WebSocket upgrade configuration, Paperless's real-time notifications fail completely — but it won't throw 403 directly. Instead you'll see "upload succeeds but page doesn't refresh."

Pitfall 5: IMAP Mail Attachment Filenames Become Garbled

**Symptom:** Documents sent to Paperless via email have all attachments named ATT0001.pdf (garbled), which can't be OCR-recognized.

**Root cause:** Paperless-ngx's IMAP Consumer uses Python's email library to parse messages. When an email's attachment Content-Disposition header doesn't include a filename parameter (especially common with iOS native Mail app), attachments get assigned default sequential names like ATT0001.

**Solution:** Configure IMAP in docker-compose.yml:

PAPERLESS_CONSUMER_RECURSIVE: "false"
PAPERLESS_EMAIL_TASK_KEEPalive: "false"
PAPERLESS_EMAIL_FROM: "paperless@yourdomain.com"

In Paperless UI → Settings → Mail, configure:

For iOS users, another effective workaround: don't send photos directly as attachments. Instead, save photos to iCloud Drive or Google Drive first, then send the share link to Paperless's email address.

Summary: Paperless-ngx Deployment Checklist

Before deploying Paperless-ngx, confirm these 5 points:

1. Use PostgreSQL — SQLite has performance risks beyond 1000 pages

2. **Install OCR language packs upfront** — Chinese recognition requires chi-sim

3. Consume folder permissions — 755 + UID 1000 is the standard

4. Nginx reverse proxy — must include WebSocket upgrade headers

5. Email attachment naming — iOS users watch out for garbled filenames

Official one-line install script:

bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"

Supports Ubuntu 开发环境/Debian/Raspberry Pi OS, automatically handles all dependencies.

For archives under 10,000 pages, Paperless-ngx runs smoothly on a 2-core 4GB machine. Beyond that scale, use PostgreSQL + 8GB RAM combo.

👉 Want AI-powered document classification on your NAS? Learn how MiniMax's multimodal API handles mixed text and image content: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link

🔗 Related Tech Articles

Deep dive into related technical topics:

Self-hosted Document Management with Paperless-ngx
技术标签: paperless-ngx, postgresql
Self-hosted Document Management with Paperless-ngx
技术标签: paperless-ngx, postgresql
自托管文档管理系统 Paperless-ngx 踩坑全记录
技术标签: 自托管文档管理, paperless-ngx
💻 Recommended Hardware
查看推荐 →