← Back to Home

n8n+Langfuse Self-Hosted 5 Real Pitfalls: 502/Mandatory Redis/2GB VPS OOM/PostgreSQL Connection Pool Exhaustion

n8nLangfuseobservabilityDockerPostgreSQLRedisself-hostingDevOpsAI Agent

From early May 2026, within one week, I crashed the box three times, each recovery taking 1-2 hours of root-cause analysis. This post documents all 5 real pitfalls with error messages, diagnostic commands, root causes, and final fixes—the focus is "here's what I broke, so you don't have to".

Self-disclaimer: All versions, commands, and configurations mentioned here have been verified through search and were current as of 2026-06-14. Please double-check the latest Langfuse and n8n documentation before applying any of this.

Pitfall 1: n8n Calling Langfuse Proxy Returns 502 Bad Gateway

In my n8n workflow I added an HTTP Request node pointing to Langfuse's ingest endpoint. The configuration looked completely correct (API key, project, endpoint all valid), but every trigger returned 502.

Error message:

HTTP 502 Bad Gateway
{"error":"upstream connect error or disconnect/reset before headers"}

**Root cause**: The n8n Docker container and the Langfuse Docker container were **not on the same Docker network**. The HTTP Request node was using http://localhost:3000, but localhost inside the n8n container refers to the n8n container itself, not the host or Langfuse.

Diagnostic command:

# Check if the n8n container can resolve the Langfuse service name
docker exec -it n8n-container sh -c "nslookup langfuse-web"
# Should return an IP on the docker network
# If it returns NXDOMAIN, the two containers are not on the same network

Fix: Put Langfuse and n8n into the same Docker Compose network, and use the service name (not localhost) in the HTTP Request node:

networks:
  - observability-net

services:
  n8n:
    networks:
      - observability-net
  langfuse-web:
    networks:
      - observability-net

Then change the n8n node URL to http://langfuse-web:3000/api/public/ingestion, **using the service name instead of localhost**. After restart, the 502 disappears immediately.

Pitfall 2: Environment Variables Changed in .env Are Completely Ignored by n8n

I changed N8N_PORT=5679 in the .env file, restarted the n8n container, and http://localhost:5678 was still happily serving. Port 5679 was not being listened on at all. Same issue with WEBHOOK_URL and EXECUTIONS_DATA_PRUNE—inside the container echo $N8N_PORT showed 5679, but the actual listening port was 5678.

**Root cause**: The n8n container was **mounting a stale .env from the host**, and the env_file directive in compose did not properly override it. More subtly, n8n's environment variable resolution priority is **"start command -e flags > env_file > container /etc/environment > host .env"**—I had previously run a test with docker run -e, and the residual environment variables from that stopped container were inherited by the compose-launched container (from the stopped container's environment snapshot).

Diagnostic command:

# Inspect the final in-effect environment variables inside the container
docker exec n8n-container env | grep -E "N8N_|WEBHOOK|EXECUTIONS"
# Compare with the .env file
cat .env | grep -E "N8N_|WEBHOOK|EXECUTIONS"

**Fix**: Delete all stopped n8n container copies, and **explicitly use environment instead of env_file in compose** (environment has the highest priority and won't conflict with mounted files):

services:
  n8n:
    environment:
      - N8N_PORT=5679
      - WEBHOOK_URL=https://your-domain.com/
      - EXECUTIONS_DATA_PRUNE=true
      - EXECUTIONS_DATA_MAX_AGE=168

Run docker container prune before compose down to clear residual copies, then compose up -d. Environment variables take effect immediately.

Pitfall 3: Langfuse v3 Mandates Redis; Skipping It Crashes on Startup

I read the Langfuse docs which said "Redis is optional", assumed it would run without Redis, but the startup log was screaming:

Error: Redis is required for Langfuse v3
Cannot start application without Redis connection

Root cause: Langfuse v3 elevated Redis from "optional cache layer" to "required component"—v3 uses Redis for session storage, rate limiting counters, and trace aggregation batching. v2 could get by with an in-memory dict for low traffic, but v3 is completely incompatible. The docs mention this in the v2→v3 upgrade notes, but the homepage quick start was not updated.

Diagnostic command:

# Check the specific Redis error in Langfuse startup log
docker logs langfuse-web 2>&1 | grep -i redis | head -10
# Check if compose is missing the redis service
docker compose ps | grep redis

Fix: Add Redis to compose (recommend Redis 7-alpine, memory usage <30MB):

services:
  redis:
    image: redis:7-alpine
    restart: always
    volumes:
      - redis-data:/data
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

volumes:
  redis-data:

Add REDIS_URL=redis://redis:6379 to the Langfuse web/worker environment variables, restart the entire stack. Langfuse returns to normal immediately after Redis comes up.

Pitfall 4: 2GB VPS Running Full n8n+Langfuse+PostgreSQL+Redis Stack Gets OOM Killed

My Tencent Cloud Lighthouse was the standard 2GB memory configuration. After running n8n + Langfuse web + Langfuse worker + PostgreSQL 15 + Redis 7 for less than 2 hours, the kernel SIGKILLed everything. dmesg showed a wall of Out of memory: Killed process.

Root cause: Langfuse worker's default config "starts worker processes at CPU count * 2". On a 2-core VPS that means 4 workers, each requesting 512MB heap on startup. Just 4 workers consume 2GB. Add PostgreSQL shared_buffers 256MB + n8n itself 300MB + Redis 100MB + system 400MB = 4GB starting budget. 2GB is guaranteed to explode.

Diagnostic command:

# Memory usage per process
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Kernel OOM records
dmesg | grep -i "out of memory" | tail -20
# Langfuse worker process count
docker exec langfuse-worker sh -c "ps aux | grep -E 'langfuse|node' | grep -v grep | wc -l"

Fix: Three steps to compress total memory usage to within 1.5GB:

services:
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_SHARED_BUFFERS=128MB
      - POSTGRES_WORK_MEM=8MB
      - POSTGRES_EFFECTIVE_CACHE_SIZE=512MB
    # Key: cap total memory
    deploy:
      resources:
        limits:
          memory: 400M

  langfuse-worker:
    image: langfuse/langfuse-worker:latest
    environment:
      - LANGFUSE_WORKER_CONCURRENCY=1  # Drop from default 4 to 1
      - NODE_OPTIONS=--max-old-space-size=384  # Cap single process heap at 384MB
    deploy:
      resources:
        limits:
          memory: 512M

The key is **adding deploy.resources.limits.memory hard limits to every service**—when exceeded, the container is restarted rather than eating the whole global memory. Also drop LANGFUSE_WORKER_CONCURRENCY from default CPU*2 to 1, and use NODE_OPTIONS=--max-old-space-size=384 to lock the per-process heap. After this change I ran for 7 days with zero OOM.

Pitfall 5: PostgreSQL Connection Pool Exhausted, Langfuse Logs "remaining connection slots are reserved"

During peak hours (n8n running 10+ concurrent workflows) Langfuse web logged:

Error: too many clients already
remaining connection slots are reserved for non-replication superuser connections

**Root cause**: PostgreSQL's default max_connections=100, but Langfuse internally **opens a new connection for every HTTP request** (no PgBouncer in the path). Combined with n8n's workflow executions also connecting directly to PostgreSQL to write execution records, **concurrent load quickly maxes out 100 connections**. PostgreSQL reserves 3 for the superuser by default, leaving 97 for everyone, all of which get occupied, and all new connections are rejected.

Diagnostic command:

# Current connection count
docker exec postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Connection source grouping
docker exec postgres psql -U postgres -c "SELECT application_name, count(*) FROM pg_stat_activity GROUP BY application_name;"
# Langfuse application connections
docker exec postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'langfuse%';"

Fix: Front PostgreSQL with PgBouncer in transaction pooling mode:

services:
  pgbouncer:
    image: edoburu/pgbouncer:latest
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_USER=postgres
      - DB_PASSWORD=${POSTGRES_PASSWORD}
      - POOL_MODE=transaction
      - MAX_CLIENT_CONN=1000        # Client connection cap
      - DEFAULT_POOL_SIZE=20        # Actual connections to PG (20 << 100)
      - RESERVE_POOL_SIZE=5
      - SERVER_IDLE_TIMEOUT=300
    deploy:
      resources:
        limits:
          memory: 64M

  langfuse-web:
    environment:
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
  langfuse-worker:
    environment:
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres

**Key point**: DEFAULT_POOL_SIZE=20 is far below PostgreSQL's max_connections=100, leaving plenty of headroom for n8n, admin sessions, and monitoring tools. Langfuse now uses pgbouncer:6432 instead of directly postgres:5432. After the change, 1000 client requests reuse 20 real connections, and the connection-exhaustion error disappears completely.

Complete docker-compose.yml Verification Checklist

Here's the final compose file structure (with sensitive info redacted) after I solved all 5 pitfalls. It has been running 14 days with zero failures:

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    restart: always
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_SHARED_BUFFERS=128MB
      - POSTGRES_WORK_MEM=8MB
    volumes:
      - postgres-data:/var/lib/postgresql/data
    deploy:
      resources:
        limits:
          memory: 400M
    networks:
      - obs-net

  redis:
    image: redis:7-alpine
    restart: always
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redis-data:/data
    deploy:
      resources:
        limits:
          memory: 128M
    networks:
      - obs-net

  pgbouncer:
    image: edoburu/pgbouncer:latest
    restart: always
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_USER=postgres
      - DB_PASSWORD=${POSTGRES_PASSWORD}
      - POOL_MODE=transaction
      - MAX_CLIENT_CONN=1000
      - DEFAULT_POOL_SIZE=20
    deploy:
      resources:
        limits:
          memory: 64M
    networks:
      - obs-net

  langfuse-web:
    image: langfuse/langfuse-web:latest
    restart: always
    environment:
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
      - REDIS_URL=redis://redis:6379
      - NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
      - SALT=${LANGFUSE_SALT}
      - LANGFUSE_INIT_ORG_ID=techpassive
      - LANGFUSE_INIT_PROJECT_ID=n8n-observability
    deploy:
      resources:
        limits:
          memory: 512M
    networks:
      - obs-net

  langfuse-worker:
    image: langfuse/langfuse-worker:latest
    restart: always
    environment:
      - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
      - REDIS_URL=redis://redis:6379
      - LANGFUSE_WORKER_CONCURRENCY=1
      - NODE_OPTIONS=--max-old-space-size=384
    deploy:
      resources:
        limits:
          memory: 512M
    networks:
      - obs-net

  n8n:
    image: n8nio/n8n:latest
    restart: always
    environment:
      - N8N_PORT=5678
      - WEBHOOK_URL=https://n8n.your-domain.com/
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_DATABASE=n8n
      - EXECUTIONS_DATA_PRUNE=true
    volumes:
      - n8n-data:/home/node/.n8n
    deploy:
      resources:
        limits:
          memory: 512M
    networks:
      - obs-net

volumes:
  postgres-data:
  redis-data:
  n8n-data:

networks:
  obs-net:
    driver: bridge

Final memory footprint (2GB VPS): postgres 380MB + redis 110MB + pgbouncer 50MB + langfuse-web 480MB + langfuse-worker 480MB + n8n 480MB + system 400MB = about 2.0GB, right at the edge. Recommend upgrading to 4GB for headroom.

Is This Stack Worth It?

Good fit if:

Not a good fit if:

Extra value for developers: Langfuse trace data shows the exact prompt + response + tool calls for every LLM call, 10x more detailed than the OpenAI dashboard. When debugging agent behavior, this is an irreplaceable tool.

---

5 pitfalls one-line summary:

1. 502: Use service names across containers (not localhost) + same network

2. **Env vars**: Use environment not env_file, prune residual containers

3. Redis: Langfuse v3 mandates it, quick start doesn't say it but v2→v3 changelog does

4. **OOM**: Add deploy.resources.limits.memory hard cap to every service, set LANGFUSE_WORKER_CONCURRENCY=1

5. **PostgreSQL connection pool**: Front with PgBouncer in transaction mode, DEFAULT_POOL_SIZE=20

If you're considering self-hosting an LLM observability stack, first check whether your VPS has enough memory, then decide between Cloud and self-hosting—2GB is the baseline, 4GB gives you headroom.

---

Resource links (official docs referenced in this post—please re-verify before applying):

If you want to run AI agent automation without building an observability stack from scratch, using a ready-made AI automation platform can save 90% of the ops time:

👉 Get started: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link

🔗 Recommended Tools

These are carefully selected tools. Using our affiliate links supports us to keep producing quality content:

DigitalOcean Cloud Vultr VPS ⭐ MiniMax Token Plan 🔍 Cloud Search
← Back to Home