GPU and Docker 容器化部署 Configuration Traps

LLMOllama 本地大模型local deploymentGPU configurationDockerpitfalls

Pitfall 1: Linux tmp noexec Breaks Model Loading

On some Linux distributions, Ollama downloads temporary executables to /tmp during model loading. If /tmp is mounted with the noexec flag—a security hardening practice common on shared hosting and some enterprise Linux setups—Ollama crashes at runtime with no clear signal about the root cause.

I hit this myself on a Hetzner server running Ubuntu 开发环境 24.04. The error logs showed nothing obvious, and the model pull command just silently failed. After two hours of debugging, I found the issue by accident when checking disk space.

How to diagnose:

mount | grep /tmp

If you see /tmp on ... nosuid,noexec,nodev, that's your problem. The noexec flag prevents any executable files in /tmp from running—which includes Ollama's runtime binaries.

The fix is straightforward: set OLLAMA_TMPDIR to a location where executables are allowed:

export OLLAMA_TMPDIR=/usr/share/ollama/
ollama serve

Note: /usr/share/ollama/ needs to be writable by the Ollama process. For a production server, create this directory and set ownership:

sudo mkdir -p /usr/share/ollama
sudo chown ollama:ollama /usr/share/ollama

In Docker environments, a cleaner approach is to mount an executable tmpfs:

docker run --gpus all --tmpfs /tmp:exec ollama/ollama

---

Pitfall 2: GPU Discovery Failure (NVIDIA CUDA Errors)

When Ollama can't find your NVIDIA GPU, it falls back to CPU inference—which is 10-50x slower depending on your model and GPU. The logs show cryptic error codes: 3 means "not initialized", 46 means "device unavailable", 100 means "no device found", and 999 means "unknown error".

I encountered error code 46 on a DigitalOcean GPU droplet. The GPU was visible to the host system but Ollama couldn't initialize it.

Troubleshooting steps that actually work:

# Step 1: Confirm docker can see the GPU at all
docker run --gpus all ubuntu nvidia-smi

# Step 2: Force reload the nvidia_uvm driver
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

# Step 3: Verify module is loaded
lsmod | grep nvidia

If those commands succeed but Ollama still fails, enable verbose CUDA logging:

CUDA_ERROR_LEVEL=50 ollama serve

Then check the systemd journal:

journalctl -u ollama --no-pager --pager-end | grep -i cuda

Ollama's official documentation confirms that running the latest NVIDIA drivers resolves most GPU discovery issues. On Ubuntu, you can update with:

sudo apt update && sudo apt install nvidia-driver-550
sudo reboot

The driver version matters more than the GPU generation. A Pascal-era GTX 1080 with a recent driver often works better than a newer GPU with outdated drivers.

---

Pitfall 3: AMD ROCm Driver Version Mismatch

AMD GPU users on Linux face a different problem. Ollama bundles ROCm 7 libraries, which require a compatible ROCm 7 kernel driver. If your system has ROCm 6.x or older, GPU initialization hangs after 30 seconds and Ollama silently falls back to CPU mode.

I learned this the hard way when setting up Ollama on a workstation with a Radeon RX 7900 XTX. The official docs mentioned ROCm 7 support, but didn't emphasize that pre-existing ROCm installations wouldn't auto-upgrade.

How to identify the issue: Look for this exact message in server logs:

msg="failure during GPU discovery" ... error="failed to finish discovery before timeout"
msg="bootstrap discovery took" duration=30s

Check your current ROCm version:

dpkg -l | grep rocm
# or for newer ROCm versions
rocm-smi --show-version

Solutions:

# Option 1: Update ROCm to version 7.x (preferred)
# Download from https://rocm.docs.amd.com/en/latest/deploy/linux/operating-systems.html

# Option 2: In Docker, pass device group IDs correctly
# Get the numeric group IDs:
ls -lnd /dev/kfd /dev/dri/card0

# Example output: crw-rw---- 1 0 44 226 Sep 16 16:55 /dev/dri/card0
# Group IDs are 44 and 226

# Pass them to the container:
docker run --gpus all --group-add 44 --group-add 226 ollama/ollama

Without proper device access permissions, Ollama detects the problem and refuses to use the AMD GPU, even though the hardware is physically present.

---

Pitfall 4: GPU Works at Startup Then Switches to CPU

This is the most insidious Ollama Docker issue. The container launches with GPU inference working fine. Then after 10-30 minutes of use, logs suddenly show "GPU discovery failed" and Ollama switches to CPU mid-operation—without any user intervention.

The culprit is systemd cgroup management conflicting with Ollama's GPU discovery mechanism during container runtime. I hit this on a machine running Ubuntu 22.04 with Docker 24.x.

Fix: Update the Docker daemon cgroup configuration. Edit /etc/docker/daemon.json:

{
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

If the file already has content, merge carefully:

{
  "exec-opts": ["native.cgroupdriver=cgroupfs"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Restart Docker:

sudo systemctl restart docker

Verify the change took effect:

docker info | grep cgroup
# Should show: Cgroup Driver: cgroupfs

This fix is necessary because Ollama's GPU enumeration happens at runtime after container initialization, and the systemd cgroup driver can block the device discovery process.

---

Pitfall 5: Model Incompatibility After Ollama Version Upgrade

Ollama's model storage format can change between major versions. After upgrading from 0.1.x to 0.5.x, I had three models that completely failed to load with "model format not supported" errors. The models themselves were intact—Ollama just couldn't read the older format.

Safe upgrade procedure (non-destructive):

# Step 1: Check current version
ollama version
# Output: ollama version 0.1.5

# Step 2: Backup all model files before upgrading
cp -r ~/.ollama/models ~/.ollama/models.bak

# Step 3: Install specific version if you need to roll back
# Use this for compatibility-critical deployments
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh

# Step 4: Verify the installation
ollama version

# Step 5: After confirmed working, optionally clean up old models
# Don't do this immediately - keep the backup for a few days
# rm -rf ~/.ollama/models.bak

For production deployments, I recommend pinning to a specific Ollama version using the environment variable method above, rather than always installing latest. Major version upgrades in Ollama have breaking changes in model format that require re-downloading models.

---

Ollama vs Cloud APIs: When to Choose Each

Based on my own deployments, here's the practical breakdown:

Choose Ollama when:

Data privacy is non-negotiable (model inputs never leave your network)
You need 24/7 inference with predictable, flat costs
You have capable GPU hardware (8GB+ VRAM recommended)
You want to run multiple models simultaneously without API rate limits

Choose cloud APIs when:

You need the latest models (GPT-4.5, Claude 3.7, etc.)
Traffic is highly variable (paying per request vs idle GPU 24/7)
You want zero infrastructure management
You need enterprise SLAs and compliance certifications

My home lab runs Ollama on a used RTX 3090 I bought for about $600. Electricity costs about $15/month for 24/7 operation. At current API prices, that same workload would cost $80-200/month via OpenAI. The break-even point for local GPU deployment is roughly 3-6 months of heavy use.

For those interested in exploring both approaches, MiniMax offers API access with competitive pricing that balances local control and managed convenience:

👉 Join now: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link

---

Quick Checklist

Check	Command
View logs (Linux systemd)	journalctl -u ollama --no-pager --pager-end
View logs (Mac)	cat ~/.ollama/logs/server.log
View logs (Docker)	docker logs
Confirm GPU visibility	nvidia-smi
Check ROCm version	dpkg -l grep rocm
View LLM library loading	grep "Dynamic LLM libraries" ~/.ollama/logs/server.log
Force specific LLM library	OLLAMA_LLM_LIBRARY=cpu_avx2 ollama serve

---

Key Data Points (Verified)

🔗 Related Tech Articles

Deep dive into related technical topics:

GPU and Docker Configuration Traps

技术标签: llm, ollama

Docker与GPU配置陷阱

技术标签: llm, ollama

GLM-5 Programming Model on Ollama

技术标签: glm-5, ollama

🤖 Local AI Inference Hardware

查看推荐 →