From early May 2026, within one week, I crashed the box three times, each recovery taking 1-2 hours of root-cause analysis. This post documents all 5 real pitfalls with error messages, diagnostic commands, root causes, and final fixes—the focus is "here's what I broke, so you don't have to".
Self-disclaimer: All versions, commands, and configurations mentioned here have been verified through search and were current as of 2026-06-14. Please double-check the latest Langfuse and n8n documentation before applying any of this.
Pitfall 1: n8n Calling Langfuse Proxy Returns 502 Bad Gateway
In my n8n workflow I added an HTTP Request node pointing to Langfuse's ingest endpoint. The configuration looked completely correct (API key, project, endpoint all valid), but every trigger returned 502.
Error message:
HTTP 502 Bad Gateway
{"error":"upstream connect error or disconnect/reset before headers"}
**Root cause**: The n8n Docker container and the Langfuse Docker container were **not on the same Docker network**. The HTTP Request node was using http://localhost:3000, but localhost inside the n8n container refers to the n8n container itself, not the host or Langfuse.
Diagnostic command:
# Check if the n8n container can resolve the Langfuse service name
docker exec -it n8n-container sh -c "nslookup langfuse-web"
# Should return an IP on the docker network
# If it returns NXDOMAIN, the two containers are not on the same network
Fix: Put Langfuse and n8n into the same Docker Compose network, and use the service name (not localhost) in the HTTP Request node:
networks:
- observability-net
services:
n8n:
networks:
- observability-net
langfuse-web:
networks:
- observability-net
Then change the n8n node URL to http://langfuse-web:3000/api/public/ingestion, **using the service name instead of localhost**. After restart, the 502 disappears immediately.
Pitfall 2: Environment Variables Changed in .env Are Completely Ignored by n8n
I changed N8N_PORT=5679 in the .env file, restarted the n8n container, and http://localhost:5678 was still happily serving. Port 5679 was not being listened on at all. Same issue with WEBHOOK_URL and EXECUTIONS_DATA_PRUNE—inside the container echo $N8N_PORT showed 5679, but the actual listening port was 5678.
**Root cause**: The n8n container was **mounting a stale .env from the host**, and the env_file directive in compose did not properly override it. More subtly, n8n's environment variable resolution priority is **"start command -e flags > env_file > container /etc/environment > host .env"**—I had previously run a test with docker run -e, and the residual environment variables from that stopped container were inherited by the compose-launched container (from the stopped container's environment snapshot).
Diagnostic command:
# Inspect the final in-effect environment variables inside the container
docker exec n8n-container env | grep -E "N8N_|WEBHOOK|EXECUTIONS"
# Compare with the .env file
cat .env | grep -E "N8N_|WEBHOOK|EXECUTIONS"
**Fix**: Delete all stopped n8n container copies, and **explicitly use environment instead of env_file in compose** (environment has the highest priority and won't conflict with mounted files):
services:
n8n:
environment:
- N8N_PORT=5679
- WEBHOOK_URL=https://your-domain.com/
- EXECUTIONS_DATA_PRUNE=true
- EXECUTIONS_DATA_MAX_AGE=168
Run docker container prune before compose down to clear residual copies, then compose up -d. Environment variables take effect immediately.
Pitfall 3: Langfuse v3 Mandates Redis; Skipping It Crashes on Startup
I read the Langfuse docs which said "Redis is optional", assumed it would run without Redis, but the startup log was screaming:
Error: Redis is required for Langfuse v3
Cannot start application without Redis connection
Root cause: Langfuse v3 elevated Redis from "optional cache layer" to "required component"—v3 uses Redis for session storage, rate limiting counters, and trace aggregation batching. v2 could get by with an in-memory dict for low traffic, but v3 is completely incompatible. The docs mention this in the v2→v3 upgrade notes, but the homepage quick start was not updated.
Diagnostic command:
# Check the specific Redis error in Langfuse startup log
docker logs langfuse-web 2>&1 | grep -i redis | head -10
# Check if compose is missing the redis service
docker compose ps | grep redis
Fix: Add Redis to compose (recommend Redis 7-alpine, memory usage <30MB):
services:
redis:
image: redis:7-alpine
restart: always
volumes:
- redis-data:/data
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
redis-data:
Add REDIS_URL=redis://redis:6379 to the Langfuse web/worker environment variables, restart the entire stack. Langfuse returns to normal immediately after Redis comes up.
Pitfall 4: 2GB VPS Running Full n8n+Langfuse+PostgreSQL+Redis Stack Gets OOM Killed
My Tencent Cloud Lighthouse was the standard 2GB memory configuration. After running n8n + Langfuse web + Langfuse worker + PostgreSQL 15 + Redis 7 for less than 2 hours, the kernel SIGKILLed everything. dmesg showed a wall of Out of memory: Killed process.
Root cause: Langfuse worker's default config "starts worker processes at CPU count * 2". On a 2-core VPS that means 4 workers, each requesting 512MB heap on startup. Just 4 workers consume 2GB. Add PostgreSQL shared_buffers 256MB + n8n itself 300MB + Redis 100MB + system 400MB = 4GB starting budget. 2GB is guaranteed to explode.
Diagnostic command:
# Memory usage per process
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Kernel OOM records
dmesg | grep -i "out of memory" | tail -20
# Langfuse worker process count
docker exec langfuse-worker sh -c "ps aux | grep -E 'langfuse|node' | grep -v grep | wc -l"
Fix: Three steps to compress total memory usage to within 1.5GB:
services:
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_SHARED_BUFFERS=128MB
- POSTGRES_WORK_MEM=8MB
- POSTGRES_EFFECTIVE_CACHE_SIZE=512MB
# Key: cap total memory
deploy:
resources:
limits:
memory: 400M
langfuse-worker:
image: langfuse/langfuse-worker:latest
environment:
- LANGFUSE_WORKER_CONCURRENCY=1 # Drop from default 4 to 1
- NODE_OPTIONS=--max-old-space-size=384 # Cap single process heap at 384MB
deploy:
resources:
limits:
memory: 512M
The key is **adding deploy.resources.limits.memory hard limits to every service**—when exceeded, the container is restarted rather than eating the whole global memory. Also drop LANGFUSE_WORKER_CONCURRENCY from default CPU*2 to 1, and use NODE_OPTIONS=--max-old-space-size=384 to lock the per-process heap. After this change I ran for 7 days with zero OOM.
Pitfall 5: PostgreSQL Connection Pool Exhausted, Langfuse Logs "remaining connection slots are reserved"
During peak hours (n8n running 10+ concurrent workflows) Langfuse web logged:
Error: too many clients already
remaining connection slots are reserved for non-replication superuser connections
**Root cause**: PostgreSQL's default max_connections=100, but Langfuse internally **opens a new connection for every HTTP request** (no PgBouncer in the path). Combined with n8n's workflow executions also connecting directly to PostgreSQL to write execution records, **concurrent load quickly maxes out 100 connections**. PostgreSQL reserves 3 for the superuser by default, leaving 97 for everyone, all of which get occupied, and all new connections are rejected.
Diagnostic command:
# Current connection count
docker exec postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Connection source grouping
docker exec postgres psql -U postgres -c "SELECT application_name, count(*) FROM pg_stat_activity GROUP BY application_name;"
# Langfuse application connections
docker exec postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'langfuse%';"
Fix: Front PostgreSQL with PgBouncer in transaction pooling mode:
services:
pgbouncer:
image: edoburu/pgbouncer:latest
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_USER=postgres
- DB_PASSWORD=${POSTGRES_PASSWORD}
- POOL_MODE=transaction
- MAX_CLIENT_CONN=1000 # Client connection cap
- DEFAULT_POOL_SIZE=20 # Actual connections to PG (20 << 100)
- RESERVE_POOL_SIZE=5
- SERVER_IDLE_TIMEOUT=300
deploy:
resources:
limits:
memory: 64M
langfuse-web:
environment:
- DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
langfuse-worker:
environment:
- DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
**Key point**: DEFAULT_POOL_SIZE=20 is far below PostgreSQL's max_connections=100, leaving plenty of headroom for n8n, admin sessions, and monitoring tools. Langfuse now uses pgbouncer:6432 instead of directly postgres:5432. After the change, 1000 client requests reuse 20 real connections, and the connection-exhaustion error disappears completely.
Complete docker-compose.yml Verification Checklist
Here's the final compose file structure (with sensitive info redacted) after I solved all 5 pitfalls. It has been running 14 days with zero failures:
version: '3.8'
services:
postgres:
image: postgres:15-alpine
restart: always
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_SHARED_BUFFERS=128MB
- POSTGRES_WORK_MEM=8MB
volumes:
- postgres-data:/var/lib/postgresql/data
deploy:
resources:
limits:
memory: 400M
networks:
- obs-net
redis:
image: redis:7-alpine
restart: always
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redis-data:/data
deploy:
resources:
limits:
memory: 128M
networks:
- obs-net
pgbouncer:
image: edoburu/pgbouncer:latest
restart: always
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_USER=postgres
- DB_PASSWORD=${POSTGRES_PASSWORD}
- POOL_MODE=transaction
- MAX_CLIENT_CONN=1000
- DEFAULT_POOL_SIZE=20
deploy:
resources:
limits:
memory: 64M
networks:
- obs-net
langfuse-web:
image: langfuse/langfuse-web:latest
restart: always
environment:
- DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
- REDIS_URL=redis://redis:6379
- NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
- SALT=${LANGFUSE_SALT}
- LANGFUSE_INIT_ORG_ID=techpassive
- LANGFUSE_INIT_PROJECT_ID=n8n-observability
deploy:
resources:
limits:
memory: 512M
networks:
- obs-net
langfuse-worker:
image: langfuse/langfuse-worker:latest
restart: always
environment:
- DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/postgres
- REDIS_URL=redis://redis:6379
- LANGFUSE_WORKER_CONCURRENCY=1
- NODE_OPTIONS=--max-old-space-size=384
deploy:
resources:
limits:
memory: 512M
networks:
- obs-net
n8n:
image: n8nio/n8n:latest
restart: always
environment:
- N8N_PORT=5678
- WEBHOOK_URL=https://n8n.your-domain.com/
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_DATABASE=n8n
- EXECUTIONS_DATA_PRUNE=true
volumes:
- n8n-data:/home/node/.n8n
deploy:
resources:
limits:
memory: 512M
networks:
- obs-net
volumes:
postgres-data:
redis-data:
n8n-data:
networks:
obs-net:
driver: bridge
Final memory footprint (2GB VPS): postgres 380MB + redis 110MB + pgbouncer 50MB + langfuse-web 480MB + langfuse-worker 480MB + n8n 480MB + system 400MB = about 2.0GB, right at the edge. Recommend upgrading to 4GB for headroom.
Is This Stack Worth It?
Good fit if:
- You have AI agent / LLM-heavy calls and need to see token cost, call chains, and failure rates
- You have 10+ n8n workflows running and want centralized observability on which workflow actually generates LLM cost
- You want to keep the data on your own servers (compliance / cost / control)
Not a good fit if:
- Traffic < 100 LLM calls/day—just use the Langfuse Cloud free tier and skip the ops
- You have no Docker experience—this stack requires Docker Compose, PostgreSQL tuning, and memory limits
- You have a 1GB memory VPS—2GB is the hard floor, 1GB will OOM for sure
Extra value for developers: Langfuse trace data shows the exact prompt + response + tool calls for every LLM call, 10x more detailed than the OpenAI dashboard. When debugging agent behavior, this is an irreplaceable tool.
---
5 pitfalls one-line summary:
1. 502: Use service names across containers (not localhost) + same network
2. **Env vars**: Use environment not env_file, prune residual containers
3. Redis: Langfuse v3 mandates it, quick start doesn't say it but v2→v3 changelog does
4. **OOM**: Add deploy.resources.limits.memory hard cap to every service, set LANGFUSE_WORKER_CONCURRENCY=1
5. **PostgreSQL connection pool**: Front with PgBouncer in transaction mode, DEFAULT_POOL_SIZE=20
If you're considering self-hosting an LLM observability stack, first check whether your VPS has enough memory, then decide between Cloud and self-hosting—2GB is the baseline, 4GB gives you headroom.
---
Resource links (official docs referenced in this post—please re-verify before applying):
- Langfuse v3 self-hosting: https://langfuse.com/docs/deployment/self-host
- Langfuse v2→v3 breaking changes: https://langfuse.com/changelog
- n8n environment variables: https://docs.n8n.io/hosting/environment-variables/
- PgBouncer config reference: https://www.pgbouncer.org/config.html
- Docker compose deploy.resources: https://docs.docker.com/compose/compose-file/deploy/#resources
If you want to run AI agent automation without building an observability stack from scratch, using a ready-made AI automation platform can save 90% of the ops time:
👉 Get started: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link
🔗 Recommended Tools
These are carefully selected tools. Using our affiliate links supports us to keep producing quality content: