Why I put n8n and Langfuse together
In April I wired up 3 LLM workflows in n8n (WeChat article summarizer, customer email responder, PDF knowledge base Q&A), each with 7-12 nodes. I hit 3 real problems I had to solve:
1. The WeChat summarizer workflow failed every Tuesday around 3 AM. n8n only showed "OpenAI node errored" — I couldn't tell if my prompt changed or if I hit the API rate limit.
2. The customer email workflow sometimes changed reply style suddenly. I suspected prompt injection or a model swap.
3. The PDF Q&A RAG pipeline had 4 nodes chained with 8s latency, but I couldn't see if the bottleneck was embedding or LLM.
Langfuse is the MIT-licensed open-source LLM observability platform (langfuse.com). The n8n ↔ Langfuse relationship had a key change in 2026: **n8n 2.x officially added native OpenTelemetry export support on 2026-04-13** (the N8N_OTEL_* environment variables). Now every node execution in n8n automatically becomes a span in Langfuse — no community node required.
But "native support" does not equal "zero configuration". I spent 3 days hitting 5 real production traps. This is the complete post-mortem.
Prerequisites and versions
Tested version matrix (as of 2026-06-19):
| Component | Version | Role |
|---|---|---|
| n8n | 2.x (N8N_OTEL_* available) | Workflow execution |
| Langfuse | v3 stable (released 2024-12-09) | LLM trace storage and visualization |
| PostgreSQL | 17 | Langfuse transactional state |
| ClickHouse | 24.x (server) | Langfuse trace analytics (new in v3) |
| Redis | 7 | Langfuse cache and queue |
| MinIO | latest (S3-compatible) | Langfuse event/media upload |
| Docker Compose | v2.20+ | Orchestration |
Official hardware recommendation is 4 cores / 16 GiB / 30 GB disk. I first got everything running on a 2-core 4 GiB dev box for 3 weeks before migrating to the recommended spec. The minimum bar to get it working is actually 2 cores 4 GiB, but for production, follow the official recommendation.
5 real production traps
Trap 1: ClickHouse image won't start on ARM Mac
**Symptom**: docker compose up shows clickhouse-1 restarting in a loop, with Illegal instruction (core dumped).
**Root cause**: Langfuse v3's default docker-compose.yml uses docker.io/clickhouse/clickhouse-server:latest (x86_64 image). Apple Silicon (M1/M2/M3/M4) cannot run ClickHouse's SSE4.2 instruction set optimization after pulling that image.
Fix:
services:
clickhouse:
image: clickhouse/clickhouse-server:latest # Official multi-arch image, not docker.io prefix
environment:
CLICKHOUSE_DB: default
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse # CHANGEME
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
Note the change from docker.io/clickhouse/clickhouse-server to clickhouse/clickhouse-server. The latter has the ARM64 manifest. Verify with:
docker manifest inspect clickhouse/clickhouse-server:latest | grep -A1 arm64
If you see an architecture: arm64 block, the multi-arch image is in place.
Trap 2: langfuse-worker hangs for 5 minutes without "Ready"
**Symptom**: langfuse-worker-1 logs stop at Running database migrations... with no further output. langfuse-web-1 also hangs at Connecting to database.
Root cause: Langfuse v3 runs background migrations on first start (upgrading the v2 Postgres schema to v3). On v3's first start it also initializes ClickHouse table structure. These two steps combined take 4-6 minutes on a 2-core machine. It's not dead, it's waiting.
Fix: Set your timeout threshold to 10 minutes (never less than 6). Then run 3-stage verification:
# Step 1: worker runs migrations
docker logs -f langfuse-worker-1 | grep -E "migration|ready|listening"
# Step 2: ClickHouse tables created
docker exec langfuse-clickhouse-1 clickhouse-client -q "SHOW TABLES FROM default"
# Step 3: web started
curl -s http://localhost:3000/api/public/health | jq .
A successful sequence ends with: langfuse-web-1 | ✓ Ready in 2.3s. **If you see langfuse-web-1 | ✓ Compiled successfully instead, that's just Next.js finishing compilation — the OTLP endpoint isn't up yet**.
Trap 3: n8n's OTEL endpoint points to localhost but runs in Docker
**Symptom**: n8n logs show OpenTelemetry: Exporter failed: ECONNREFUSED 127.0.0.1:4318, but the browser can reach Langfuse just fine.
**Root cause**: When n8n also runs in Docker, localhost and 127.0.0.1 point to the n8n container itself, not the host's Langfuse. Both containers must reach each other on the same Docker network.
**Fix**: Put n8n and langfuse-web on the same network via docker-compose (or use the default network shared by both compose files). I use a second compose file docker-compose.n8n.yml sharing the langfuse_default network:
# docker-compose.n8n.yml
services:
n8n:
image: n8nio/n8n:2
networks:
- langfuse_default # Default network created by Langfuse compose
environment:
N8N_OTEL_ENABLED: "true"
N8N_OTEL_EXPORTER_OTLP_ENDPOINT: "http://langfuse-web:3000"
N8N_OTEL_EXPORTER_OTLP_TRACING_PATH: "/api/public/otel/v1/traces"
N8N_OTEL_TRACES_INCLUDE_NODE_SPANS: "true"
N8N_OTEL_TRACES_PRODUCTION_ONLY: "false"
networks:
langfuse_default:
external: true
name: langfuse_default # Must match the network name created by Langfuse compose
Start order matters: first docker compose -f docker-compose.yml up -d (Langfuse), then docker compose -f docker-compose.n8n.yml up -d (n8n). Otherwise the second one will fail with network langfuse_default not found.
Trap 4: Traces reach Langfuse but no LLM token/cost stats
Symptom: In Langfuse UI you can see spans for every n8n node (HTTP Request, Set, Code), but the OpenAI/Anthropic nodes show 0 tokens and 0 cost.
**Root cause**: n8n's OTEL exporter only handles "exporting span frames" — it does **not parse the LLM response body's usage field**. Langfuse needs input_tokens, output_tokens and the model name to compute cost.
Fix: Choose one of two paths:
**Path A: Community node rorubyy/n8n-nodes-openai-langfuse** (n8n ≥ 0.187). It calls Langfuse's ingestion API directly from inside the OpenAI node, so token stats are automatic. Install:
# Inside the n8n container
docker exec -u root n8n-n8n-1 npm install -g n8n-nodes-openai-langfuse
# Then n8n Settings → Community Nodes → search "openai-langfuse" and install
Path B: Add a Code node after the HTTP Request node to manually extract usage (works for Anthropic / any compatible API):
// Code node, mode = "Run Once for Each Item"
const usage = $input.item.json.usage;
return {
json: {
langfuse_update: {
usage: {
input: usage.prompt_tokens,
output: usage.completion_tokens,
total: usage.total_tokens
},
model: $input.item.json.model
}
}
};
Then add an HTTP Request node POSTing to http://langfuse-web:3000/api/public/ingestion to update the trace.
Trap 5: All SDK v1.x calls fail after upgrading to v3
**Symptom**: Apps that used Langfuse JS SDK v1.x all return 401 after upgrading Langfuse: Authentication failed: API key not found.
Root cause: After 2024-11-11 (cloud) and the v3 self-hosted cutover, Langfuse enforces SDK v2+ API key format. SDK v1.x keys are silently rejected by the new version.
Fix:
# Upgrade SDK in existing projects
npm install @langfuse/core@latest @langfuse/tracing@latest
# Critical change: v2 SDK separates secret/public keys via env vars
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_BASEURL="http://localhost:3000" # self-hosted must set explicitly
The full v1 → v2 breaking change list is in the Langfuse v2→v3 upgrade docs. The 3 core changes: API endpoint paths become `/api/public/otel`, API key validation format changes, ingestion batch endpoint path becomes `/api/public/ingestion`.
Complete docker-compose.yml template
Here's the final version after my 3 days of debugging (comments stripped, all # CHANGEME items preserved):
# docker-compose.yml
services:
langfuse-web:
image: docker.io/langfuse/langfuse:3
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/postgres # CHANGEME
NEXTAUTH_URL: http://localhost:3000 # CHANGEME, required
NEXTAUTH_SECRET: ${NEXTAUTH_SECRET:-some-long-random-string} # CHANGEME
LANGFUSE_INIT_ORG_ID: ${LANGFUSE_INIT_ORG_ID:-default-org}
LANGFUSE_INIT_ORG_NAME: ${LANGFUSE_INIT_ORG_NAME:-Default}
LANGFUSE_INIT_PROJECT_ID: ${LANGFUSE_INIT_PROJECT_ID:-default-project}
LANGFUSE_INIT_PROJECT_SECRET_KEY: ${LANGFUSE_INIT_PROJECT_SECRET_KEY:-sk-lf-default} # CHANGEME
LANGFUSE_INIT_USER_NAME: ${LANGFUSE_INIT_USER_NAME:-admin}
LANGFUSE_INIT_USER_PASSWORD: ${LANGFUSE_INIT_USER_PASSWORD:-admin123} # CHANGEME
LANGFUSE_INIT_USER_EMAIL: ${LANGFUSE_INIT_USER_EMAIL:-admin@example.com} # CHANGEME
CLICKHOUSE_URL: http://clickhouse:8123
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse # CHANGEME
CLICKHOUSE_MIGRATION_URL: clickhouse://clickhouse:9000
REDIS_CONNECTION_STRING: redis://redis:6379
LANGFUSE_S3_EVENT_UPLOAD_BUCKET: langfuse
LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID: minio
LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY: miniosecret # CHANGEME
LANGFUSE_S3_EVENT_UPLOAD_ENDPOINT: http://minio:9000
LANGFUSE_S3_EVENT_UPLOAD_FORCE_PATH_STYLE: "true"
depends_on:
postgres: { condition: service_healthy }
clickhouse: { condition: service_healthy }
redis: { condition: service_healthy }
minio: { condition: service_healthy }
langfuse-worker:
image: docker.io/langfuse/langfuse-worker:3
environment: &langfuse-worker-env
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/postgres
CLICKHOUSE_URL: http://clickhouse:8123
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse
CLICKHOUSE_MIGRATION_URL: clickhouse://clickhouse:9000
REDIS_CONNECTION_STRING: redis://redis:6379
SALT: ${SALT:-some-other-random-string} # CHANGEME
ENCRYPTION_KEY: ${ENCRYPTION_KEY:-must-be-32-chars-long-aaaaaaaaaa} # CHANGEME, 32 chars
depends_on:
postgres: { condition: service_healthy }
clickhouse: { condition: service_healthy }
redis: { condition: service_healthy }
minio: { condition: service_healthy }
postgres:
image: docker.io/postgres:${POSTGRES_VERSION:-17}
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres # CHANGEME
POSTGRES_DB: postgres
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 10
clickhouse:
image: clickhouse/clickhouse-server:latest # Mandatory for ARM Mac
environment:
CLICKHOUSE_DB: default
CLICKHOUSE_USER: clickhouse
CLICKHOUSE_PASSWORD: clickhouse
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
ulimits:
nofile: { soft: 262144, hard: 262144 }
volumes:
- clickhouse_data:/var/lib/clickhouse
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping"]
interval: 5s
timeout: 3s
retries: 10
redis:
image: docker.io/redis:7
command: redis-server --maxmemory-policy noeviction
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 10
minio:
image: docker.io/minio/minio:latest
command: server /data --console-address ":9090"
environment:
MINIO_ROOT_USER: minio
MINIO_ROOT_PASSWORD: miniosecret # CHANGEME
ports:
- "9090:9090" # MinIO console
volumes:
- minio_data:/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 5s
timeout: 3s
retries: 10
volumes:
postgres_data:
clickhouse_data:
redis_data:
minio_data:
The expected output when all 6 services are healthy:
docker compose ps
# NAME SERVICE STATUS
# langfuse-web-1 langfuse-web Up (healthy)
# langfuse-worker-1 langfuse-worker Up
# langfuse-postgres-1 postgres Up (healthy)
# langfuse-clickhouse-1 clickhouse Up (healthy)
# langfuse-redis-1 redis Up (healthy)
# langfuse-minio-1 minio Up (healthy)
**Note**: langfuse-worker will not show healthy (it has no built-in healthcheck endpoint), but Up is the correct status.
Verifying traces really arrived
After the stack is up, run this minimal workflow to verify:
1. In n8n, create a new workflow
2. Add a Schedule Trigger (every minute)
3. Add an HTTP Request node (GET https://api.github.com/zen)
4. Save and activate
After 1-2 minutes, open Langfuse UI → your project → Traces. You should see one entry in the trace list. Click into it and look at the span tree:
workflow.execute
└── node.execute (HTTP Request)
└── http.client.request
└── http.client.response (200, 543ms)
If you only see workflow.execute with no child spans, your N8N_OTEL_TRACES_INCLUDE_NODE_SPANS is not set to true.
5 integration options compared side-by-side
| Option | Setup cost | Token stats | Best for |
|---|---|---|---|
| **n8n 2.x native OTEL (recommended here)** | Low (env vars) | Need HTTP Request + Code node manual parse | Want full trace tree, don't care about cost |
| **rorubyy/n8n-nodes-openai-langfuse** | Medium (community node install) | Automatic | OpenAI only, want cost tracking |
| **rwb-truelime/n8n-langfuse-shipper** (Python) | High (extra service) | Automatic | Custom batching, already running Python services |
| **OpenRouter Broadcast** | Medium (replace LLM provider) | Automatic | Already use OpenRouter for multi-model routing |
| **HTTP Request node direct to Langfuse API** | Low | Manual parse | Single workflow verification, ad-hoc debugging |
My pick: production uses the "native OTEL + replace OpenAI node with rorubyy/n8n-nodes-openai-langfuse" combo, with HTTP Request nodes covered by native OTEL as a fallback.
Does this solve my original 3 problems?
Going back to the 3 pain points I had at the start, after wiring up Langfuse:
1. **WeChat summarizer failure**: Click into the trace and the failed node shows http.status_code: 429 (OpenAI rate limit). I added a retry node and it stopped failing.
2. **Email reply style change**: Compare the prompt field across traces. Found that a teammate edited a template variable in the Code node. git diff rolled it back.
3. PDF Q&A 8s latency: The span tree showed the embedding node alone taking 6.2s. Switched to the quantized bge-m3 model and it dropped to 1.8s.
Bottom line: All 3 problems became observable, diagnosable, fixable. That's the biggest value Langfuse gives n8n users — upgrading from "the workflow ran successfully" to "the workflow ran correctly".
Frequently asked questions
Q: Does Langfuse v3 require ClickHouse?
A: Yes. v3 forces ClickHouse for trace storage (v2 used Postgres for traces, v3 splits that out). If disk IO is tight, you can put ClickHouse data on S3 blob storage (Langfuse supports S3-as-disk mode via 5 environment variables: LANGFUSE_S3_EVENT_UPLOAD_BUCKET etc.).
Q: Does n8n need to be 2.x to use OTEL?
A: Yes. n8n 1.x has no built-in OTEL exporter; community solutions all require editing the Dockerfile to add npm packages. 2.x toggles it on with environment variables. n8n 2.0 stable shipped 2025-09.
Q: What's the minimum memory for Langfuse v3?
A: 2 GiB can boot for dev, but ClickHouse eats 1.2 GiB on startup, Postgres 400 MiB, langfuse-web 600 MiB, langfuse-worker 800 MiB. 2 GiB will hit OOM kills frequently. 4 GiB minimum, 16 GiB recommended for production.
Q: Will traces fill up the disk?
A: Yes. ClickHouse has no built-in size cap. Set max_server_memory_usage and max_table_size_to_drop, or use TTL for auto-cleanup. Langfuse's official docs cover ClickHouse storage growth management.
What's next
- **Add Langfuse Score**: Use Langfuse SDK v2 to add a quality score (0-1) per trace, combined with LLM-as-a-judge for automated answer evaluation.
- **Wire up Prompt Management**: Replace inline prompt strings in n8n with Langfuse Prompt references. Switch dev/staging/prod without editing the n8n workflow.
- **Dataset + Experiment**: Convert historical traces into a dataset, then compare new prompt versions against it.
---
Further reading (related published articles)
- n8n + Ollama + Qdrant Integration Pitfalls: From Docker Network to RAG Workflow
- 5 Real Production-Grade n8n Self-Hosted Pitfalls
- Claude Code Routines + n8n Integration: 5 Real Traps
- MCP Server Setup Traps: 5 Common Pitfalls
---
> 🚀 Running LLM workflows in n8n? MiniMax Token Plan gives self-hosted users 1B stable monthly tokens, with direct China connectivity, and compatibility with Anthropic/OpenAI/DeepSeek quotas. 👉 Get it here: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link
📌 This article was AI-assisted generated and human-reviewed | TechPassive — An AI-driven content testing site focused on real tool reviews
🔗 Recommended Tools
These are carefully selected tools. Using our affiliate links supports us to keep producing quality content: