n8n + Langfuse v3 Self-Hosted in Production: OTEL Tracing and 5 Real Pitfalls (2026)

Why I put n8n and Langfuse together

In April I wired up 3 LLM workflows in n8n (WeChat article summarizer, customer email responder, PDF knowledge base Q&A), each with 7-12 nodes. I hit 3 real problems I had to solve:

1. The WeChat summarizer workflow failed every Tuesday around 3 AM. n8n only showed "OpenAI node errored" — I couldn't tell if my prompt changed or if I hit the API rate limit.

2. The customer email workflow sometimes changed reply style suddenly. I suspected prompt injection or a model swap.

3. The PDF Q&A RAG pipeline had 4 nodes chained with 8s latency, but I couldn't see if the bottleneck was embedding or LLM.

Langfuse is the MIT-licensed open-source LLM observability platform (langfuse.com). The n8n ↔ Langfuse relationship had a key change in 2026: **n8n 2.x officially added native OpenTelemetry export support on 2026-04-13** (the N8N_OTEL_* environment variables). Now every node execution in n8n automatically becomes a span in Langfuse — no community node required.

But "native support" does not equal "zero configuration". I spent 3 days hitting 5 real production traps. This is the complete post-mortem.

Prerequisites and versions

Tested version matrix (as of 2026-06-19):

Component	Version	Role
n8n	2.x (N8N_OTEL_* available)	Workflow execution
Langfuse	v3 stable (released 2024-12-09)	LLM trace storage and visualization
PostgreSQL	17	Langfuse transactional state
ClickHouse	24.x (server)	Langfuse trace analytics (new in v3)
Redis	7	Langfuse cache and queue
MinIO	latest (S3-compatible)	Langfuse event/media upload
Docker Compose	v2.20+	Orchestration

Official hardware recommendation is 4 cores / 16 GiB / 30 GB disk. I first got everything running on a 2-core 4 GiB dev box for 3 weeks before migrating to the recommended spec. The minimum bar to get it working is actually 2 cores 4 GiB, but for production, follow the official recommendation.

5 real production traps

Trap 1: ClickHouse image won't start on ARM Mac

**Symptom**: docker compose up shows clickhouse-1 restarting in a loop, with Illegal instruction (core dumped).

**Root cause**: Langfuse v3's default docker-compose.yml uses docker.io/clickhouse/clickhouse-server:latest (x86_64 image). Apple Silicon (M1/M2/M3/M4) cannot run ClickHouse's SSE4.2 instruction set optimization after pulling that image.

Fix:

services:
  clickhouse:
    image: clickhouse/clickhouse-server:latest  # Official multi-arch image, not docker.io prefix
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse  # CHANGEME
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1

Note the change from docker.io/clickhouse/clickhouse-server to clickhouse/clickhouse-server. The latter has the ARM64 manifest. Verify with:

docker manifest inspect clickhouse/clickhouse-server:latest | grep -A1 arm64

If you see an architecture: arm64 block, the multi-arch image is in place.

Trap 2: langfuse-worker hangs for 5 minutes without "Ready"

**Symptom**: langfuse-worker-1 logs stop at Running database migrations... with no further output. langfuse-web-1 also hangs at Connecting to database.

Root cause: Langfuse v3 runs background migrations on first start (upgrading the v2 Postgres schema to v3). On v3's first start it also initializes ClickHouse table structure. These two steps combined take 4-6 minutes on a 2-core machine. It's not dead, it's waiting.

Fix: Set your timeout threshold to 10 minutes (never less than 6). Then run 3-stage verification:

# Step 1: worker runs migrations
docker logs -f langfuse-worker-1 | grep -E "migration|ready|listening"

# Step 2: ClickHouse tables created
docker exec langfuse-clickhouse-1 clickhouse-client -q "SHOW TABLES FROM default"

# Step 3: web started
curl -s http://localhost:3000/api/public/health | jq .

A successful sequence ends with: langfuse-web-1 | ✓ Ready in 2.3s. **If you see langfuse-web-1 | ✓ Compiled successfully instead, that's just Next.js finishing compilation — the OTLP endpoint isn't up yet**.

Trap 3: n8n's OTEL endpoint points to localhost but runs in Docker

**Symptom**: n8n logs show OpenTelemetry: Exporter failed: ECONNREFUSED 127.0.0.1:4318, but the browser can reach Langfuse just fine.

**Root cause**: When n8n also runs in Docker, localhost and 127.0.0.1 point to the n8n container itself, not the host's Langfuse. Both containers must reach each other on the same Docker network.

**Fix**: Put n8n and langfuse-web on the same network via docker-compose (or use the default network shared by both compose files). I use a second compose file docker-compose.n8n.yml sharing the langfuse_default network:

# docker-compose.n8n.yml
services:
  n8n:
    image: n8nio/n8n:2
    networks:
      - langfuse_default  # Default network created by Langfuse compose
    environment:
      N8N_OTEL_ENABLED: "true"
      N8N_OTEL_EXPORTER_OTLP_ENDPOINT: "http://langfuse-web:3000"
      N8N_OTEL_EXPORTER_OTLP_TRACING_PATH: "/api/public/otel/v1/traces"
      N8N_OTEL_TRACES_INCLUDE_NODE_SPANS: "true"
      N8N_OTEL_TRACES_PRODUCTION_ONLY: "false"

networks:
  langfuse_default:
    external: true
    name: langfuse_default  # Must match the network name created by Langfuse compose

Start order matters: first docker compose -f docker-compose.yml up -d (Langfuse), then docker compose -f docker-compose.n8n.yml up -d (n8n). Otherwise the second one will fail with network langfuse_default not found.

Trap 4: Traces reach Langfuse but no LLM token/cost stats

Symptom: In Langfuse UI you can see spans for every n8n node (HTTP Request, Set, Code), but the OpenAI/Anthropic nodes show 0 tokens and 0 cost.

**Root cause**: n8n's OTEL exporter only handles "exporting span frames" — it does **not parse the LLM response body's usage field**. Langfuse needs input_tokens, output_tokens and the model name to compute cost.

Fix: Choose one of two paths:

**Path A: Community node rorubyy/n8n-nodes-openai-langfuse** (n8n ≥ 0.187). It calls Langfuse's ingestion API directly from inside the OpenAI node, so token stats are automatic. Install:

# Inside the n8n container
docker exec -u root n8n-n8n-1 npm install -g n8n-nodes-openai-langfuse
# Then n8n Settings → Community Nodes → search "openai-langfuse" and install

Path B: Add a Code node after the HTTP Request node to manually extract usage (works for Anthropic / any compatible API):

// Code node, mode = "Run Once for Each Item"
const usage = $input.item.json.usage;
return {
  json: {
    langfuse_update: {
      usage: {
        input: usage.prompt_tokens,
        output: usage.completion_tokens,
        total: usage.total_tokens
      },
      model: $input.item.json.model
    }
  }
};

Then add an HTTP Request node POSTing to http://langfuse-web:3000/api/public/ingestion to update the trace.

Trap 5: All SDK v1.x calls fail after upgrading to v3

**Symptom**: Apps that used Langfuse JS SDK v1.x all return 401 after upgrading Langfuse: Authentication failed: API key not found.

Root cause: After 2024-11-11 (cloud) and the v3 self-hosted cutover, Langfuse enforces SDK v2+ API key format. SDK v1.x keys are silently rejected by the new version.

Fix:

# Upgrade SDK in existing projects
npm install @langfuse/core@latest @langfuse/tracing@latest
# Critical change: v2 SDK separates secret/public keys via env vars
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_BASEURL="http://localhost:3000"  # self-hosted must set explicitly

The full v1 → v2 breaking change list is in the Langfuse v2→v3 upgrade docs. The 3 core changes: API endpoint paths become `/api/public/otel`, API key validation format changes, ingestion batch endpoint path becomes `/api/public/ingestion`.

Complete docker-compose.yml template

Here's the final version after my 3 days of debugging (comments stripped, all # CHANGEME items preserved):

# docker-compose.yml
services:
  langfuse-web:
    image: docker.io/langfuse/langfuse:3
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://postgres:postgres@postgres:5432/postgres  # CHANGEME
      NEXTAUTH_URL: http://localhost:3000  # CHANGEME, required
      NEXTAUTH_SECRET: ${NEXTAUTH_SECRET:-some-long-random-string}  # CHANGEME
      LANGFUSE_INIT_ORG_ID: ${LANGFUSE_INIT_ORG_ID:-default-org}
      LANGFUSE_INIT_ORG_NAME: ${LANGFUSE_INIT_ORG_NAME:-Default}
      LANGFUSE_INIT_PROJECT_ID: ${LANGFUSE_INIT_PROJECT_ID:-default-project}
      LANGFUSE_INIT_PROJECT_SECRET_KEY: ${LANGFUSE_INIT_PROJECT_SECRET_KEY:-sk-lf-default}  # CHANGEME
      LANGFUSE_INIT_USER_NAME: ${LANGFUSE_INIT_USER_NAME:-admin}
      LANGFUSE_INIT_USER_PASSWORD: ${LANGFUSE_INIT_USER_PASSWORD:-admin123}  # CHANGEME
      LANGFUSE_INIT_USER_EMAIL: ${LANGFUSE_INIT_USER_EMAIL:-admin@example.com}  # CHANGEME
      CLICKHOUSE_URL: http://clickhouse:8123
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse  # CHANGEME
      CLICKHOUSE_MIGRATION_URL: clickhouse://clickhouse:9000
      REDIS_CONNECTION_STRING: redis://redis:6379
      LANGFUSE_S3_EVENT_UPLOAD_BUCKET: langfuse
      LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID: minio
      LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY: miniosecret  # CHANGEME
      LANGFUSE_S3_EVENT_UPLOAD_ENDPOINT: http://minio:9000
      LANGFUSE_S3_EVENT_UPLOAD_FORCE_PATH_STYLE: "true"
    depends_on:
      postgres: { condition: service_healthy }
      clickhouse: { condition: service_healthy }
      redis: { condition: service_healthy }
      minio: { condition: service_healthy }

  langfuse-worker:
    image: docker.io/langfuse/langfuse-worker:3
    environment: &langfuse-worker-env
      DATABASE_URL: postgresql://postgres:postgres@postgres:5432/postgres
      CLICKHOUSE_URL: http://clickhouse:8123
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse
      CLICKHOUSE_MIGRATION_URL: clickhouse://clickhouse:9000
      REDIS_CONNECTION_STRING: redis://redis:6379
      SALT: ${SALT:-some-other-random-string}  # CHANGEME
      ENCRYPTION_KEY: ${ENCRYPTION_KEY:-must-be-32-chars-long-aaaaaaaaaa}  # CHANGEME, 32 chars
    depends_on:
      postgres: { condition: service_healthy }
      clickhouse: { condition: service_healthy }
      redis: { condition: service_healthy }
      minio: { condition: service_healthy }

  postgres:
    image: docker.io/postgres:${POSTGRES_VERSION:-17}
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres  # CHANGEME
      POSTGRES_DB: postgres
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 10

  clickhouse:
    image: clickhouse/clickhouse-server:latest  # Mandatory for ARM Mac
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    ulimits:
      nofile: { soft: 262144, hard: 262144 }
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping"]
      interval: 5s
      timeout: 3s
      retries: 10

  redis:
    image: docker.io/redis:7
    command: redis-server --maxmemory-policy noeviction
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 10

  minio:
    image: docker.io/minio/minio:latest
    command: server /data --console-address ":9090"
    environment:
      MINIO_ROOT_USER: minio
      MINIO_ROOT_PASSWORD: miniosecret  # CHANGEME
    ports:
      - "9090:9090"  # MinIO console
    volumes:
      - minio_data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 5s
      timeout: 3s
      retries: 10

volumes:
  postgres_data:
  clickhouse_data:
  redis_data:
  minio_data:

The expected output when all 6 services are healthy:

docker compose ps
# NAME                  SERVICE             STATUS
# langfuse-web-1        langfuse-web        Up (healthy)
# langfuse-worker-1     langfuse-worker     Up
# langfuse-postgres-1   postgres            Up (healthy)
# langfuse-clickhouse-1 clickhouse          Up (healthy)
# langfuse-redis-1      redis               Up (healthy)
# langfuse-minio-1      minio               Up (healthy)

**Note**: langfuse-worker will not show healthy (it has no built-in healthcheck endpoint), but Up is the correct status.

Verifying traces really arrived

After the stack is up, run this minimal workflow to verify:

1. In n8n, create a new workflow

2. Add a Schedule Trigger (every minute)

3. Add an HTTP Request node (GET https://api.github.com/zen)

4. Save and activate

After 1-2 minutes, open Langfuse UI → your project → Traces. You should see one entry in the trace list. Click into it and look at the span tree:

workflow.execute
  └── node.execute (HTTP Request)
       └── http.client.request
            └── http.client.response (200, 543ms)

If you only see workflow.execute with no child spans, your N8N_OTEL_TRACES_INCLUDE_NODE_SPANS is not set to true.

5 integration options compared side-by-side

Option	Setup cost	Token stats	Best for
n8n 2.x native OTEL (recommended here)	Low (env vars)	Need HTTP Request + Code node manual parse	Want full trace tree, don't care about cost
rorubyy/n8n-nodes-openai-langfuse	Medium (community node install)	Automatic	OpenAI only, want cost tracking
rwb-truelime/n8n-langfuse-shipper (Python)	High (extra service)	Automatic	Custom batching, already running Python services
OpenRouter Broadcast	Medium (replace LLM provider)	Automatic	Already use OpenRouter for multi-model routing
HTTP Request node direct to Langfuse API	Low	Manual parse	Single workflow verification, ad-hoc debugging

My pick: production uses the "native OTEL + replace OpenAI node with rorubyy/n8n-nodes-openai-langfuse" combo, with HTTP Request nodes covered by native OTEL as a fallback.

Does this solve my original 3 problems?

Going back to the 3 pain points I had at the start, after wiring up Langfuse:

1. **WeChat summarizer failure**: Click into the trace and the failed node shows http.status_code: 429 (OpenAI rate limit). I added a retry node and it stopped failing.

2. **Email reply style change**: Compare the prompt field across traces. Found that a teammate edited a template variable in the Code node. git diff rolled it back.

3. PDF Q&A 8s latency: The span tree showed the embedding node alone taking 6.2s. Switched to the quantized bge-m3 model and it dropped to 1.8s.

Bottom line: All 3 problems became observable, diagnosable, fixable. That's the biggest value Langfuse gives n8n users — upgrading from "the workflow ran successfully" to "the workflow ran correctly".

Frequently asked questions

Q: Does Langfuse v3 require ClickHouse?

A: Yes. v3 forces ClickHouse for trace storage (v2 used Postgres for traces, v3 splits that out). If disk IO is tight, you can put ClickHouse data on S3 blob storage (Langfuse supports S3-as-disk mode via 5 environment variables: LANGFUSE_S3_EVENT_UPLOAD_BUCKET etc.).

Q: Does n8n need to be 2.x to use OTEL?

A: Yes. n8n 1.x has no built-in OTEL exporter; community solutions all require editing the Dockerfile to add npm packages. 2.x toggles it on with environment variables. n8n 2.0 stable shipped 2025-09.

Q: What's the minimum memory for Langfuse v3?

A: 2 GiB can boot for dev, but ClickHouse eats 1.2 GiB on startup, Postgres 400 MiB, langfuse-web 600 MiB, langfuse-worker 800 MiB. 2 GiB will hit OOM kills frequently. 4 GiB minimum, 16 GiB recommended for production.

Q: Will traces fill up the disk?

A: Yes. ClickHouse has no built-in size cap. Set max_server_memory_usage and max_table_size_to_drop, or use TTL for auto-cleanup. Langfuse's official docs cover ClickHouse storage growth management.

What's next

**Add Langfuse Score**: Use Langfuse SDK v2 to add a quality score (0-1) per trace, combined with LLM-as-a-judge for automated answer evaluation.
**Wire up Prompt Management**: Replace inline prompt strings in n8n with Langfuse Prompt references. Switch dev/staging/prod without editing the n8n workflow.
**Dataset + Experiment**: Convert historical traces into a dataset, then compare new prompt versions against it.

---