GPU and Docker 容器化部署 Configuration Traps
Pitfall 1: Linux tmp noexec Breaks Model Loading
On some Linux distributions, Ollama downloads temporary executables to /tmp during model loading. If /tmp is mounted with the noexec flag—a security hardening practice common on shared hosting and some enterprise Linux setups—Ollama crashes at runtime with no clear signal about the root cause.
I hit this myself on a Hetzner server running Ubuntu 开发环境 24.04. The error logs showed nothing obvious, and the model pull command just silently failed. After two hours of debugging, I found the issue by accident when checking disk space.
How to diagnose:
mount | grep /tmp
If you see /tmp on ... nosuid,noexec,nodev, that's your problem. The noexec flag prevents any executable files in /tmp from running—which includes Ollama's runtime binaries.
The fix is straightforward: set OLLAMA_TMPDIR to a location where executables are allowed:
export OLLAMA_TMPDIR=/usr/share/ollama/
ollama serve
Note: /usr/share/ollama/ needs to be writable by the Ollama process. For a production server, create this directory and set ownership:
sudo mkdir -p /usr/share/ollama
sudo chown ollama:ollama /usr/share/ollama
In Docker environments, a cleaner approach is to mount an executable tmpfs:
docker run --gpus all --tmpfs /tmp:exec ollama/ollama
---
Pitfall 2: GPU Discovery Failure (NVIDIA CUDA Errors)
When Ollama can't find your NVIDIA GPU, it falls back to CPU inference—which is 10-50x slower depending on your model and GPU. The logs show cryptic error codes: 3 means "not initialized", 46 means "device unavailable", 100 means "no device found", and 999 means "unknown error".
I encountered error code 46 on a DigitalOcean GPU droplet. The GPU was visible to the host system but Ollama couldn't initialize it.
Troubleshooting steps that actually work:
# Step 1: Confirm docker can see the GPU at all
docker run --gpus all ubuntu nvidia-smi
# Step 2: Force reload the nvidia_uvm driver
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
# Step 3: Verify module is loaded
lsmod | grep nvidia
If those commands succeed but Ollama still fails, enable verbose CUDA logging:
CUDA_ERROR_LEVEL=50 ollama serve
Then check the systemd journal:
journalctl -u ollama --no-pager --pager-end | grep -i cuda
Ollama's official documentation confirms that running the latest NVIDIA drivers resolves most GPU discovery issues. On Ubuntu, you can update with:
sudo apt update && sudo apt install nvidia-driver-550
sudo reboot
The driver version matters more than the GPU generation. A Pascal-era GTX 1080 with a recent driver often works better than a newer GPU with outdated drivers.
---
Pitfall 3: AMD ROCm Driver Version Mismatch
AMD GPU users on Linux face a different problem. Ollama bundles ROCm 7 libraries, which require a compatible ROCm 7 kernel driver. If your system has ROCm 6.x or older, GPU initialization hangs after 30 seconds and Ollama silently falls back to CPU mode.
I learned this the hard way when setting up Ollama on a workstation with a Radeon RX 7900 XTX. The official docs mentioned ROCm 7 support, but didn't emphasize that pre-existing ROCm installations wouldn't auto-upgrade.
How to identify the issue: Look for this exact message in server logs:
msg="failure during GPU discovery" ... error="failed to finish discovery before timeout"
msg="bootstrap discovery took" duration=30s
Check your current ROCm version:
dpkg -l | grep rocm
# or for newer ROCm versions
rocm-smi --show-version
Solutions:
# Option 1: Update ROCm to version 7.x (preferred)
# Download from https://rocm.docs.amd.com/en/latest/deploy/linux/operating-systems.html
# Option 2: In Docker, pass device group IDs correctly
# Get the numeric group IDs:
ls -lnd /dev/kfd /dev/dri/card0
# Example output: crw-rw---- 1 0 44 226 Sep 16 16:55 /dev/dri/card0
# Group IDs are 44 and 226
# Pass them to the container:
docker run --gpus all --group-add 44 --group-add 226 ollama/ollama
Without proper device access permissions, Ollama detects the problem and refuses to use the AMD GPU, even though the hardware is physically present.
---
Pitfall 4: GPU Works at Startup Then Switches to CPU
This is the most insidious Ollama Docker issue. The container launches with GPU inference working fine. Then after 10-30 minutes of use, logs suddenly show "GPU discovery failed" and Ollama switches to CPU mid-operation—without any user intervention.
The culprit is systemd cgroup management conflicting with Ollama's GPU discovery mechanism during container runtime. I hit this on a machine running Ubuntu 22.04 with Docker 24.x.
Fix: Update the Docker daemon cgroup configuration. Edit /etc/docker/daemon.json:
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
If the file already has content, merge carefully:
{
"exec-opts": ["native.cgroupdriver=cgroupfs"],
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Restart Docker:
sudo systemctl restart docker
Verify the change took effect:
docker info | grep cgroup
# Should show: Cgroup Driver: cgroupfs
This fix is necessary because Ollama's GPU enumeration happens at runtime after container initialization, and the systemd cgroup driver can block the device discovery process.
---
Pitfall 5: Model Incompatibility After Ollama Version Upgrade
Ollama's model storage format can change between major versions. After upgrading from 0.1.x to 0.5.x, I had three models that completely failed to load with "model format not supported" errors. The models themselves were intact—Ollama just couldn't read the older format.
Safe upgrade procedure (non-destructive):
# Step 1: Check current version
ollama version
# Output: ollama version 0.1.5
# Step 2: Backup all model files before upgrading
cp -r ~/.ollama/models ~/.ollama/models.bak
# Step 3: Install specific version if you need to roll back
# Use this for compatibility-critical deployments
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh
# Step 4: Verify the installation
ollama version
# Step 5: After confirmed working, optionally clean up old models
# Don't do this immediately - keep the backup for a few days
# rm -rf ~/.ollama/models.bak
For production deployments, I recommend pinning to a specific Ollama version using the environment variable method above, rather than always installing latest. Major version upgrades in Ollama have breaking changes in model format that require re-downloading models.
---
Ollama vs Cloud APIs: When to Choose Each
Based on my own deployments, here's the practical breakdown:
Choose Ollama when:
- Data privacy is non-negotiable (model inputs never leave your network)
- You need 24/7 inference with predictable, flat costs
- You have capable GPU hardware (8GB+ VRAM recommended)
- You want to run multiple models simultaneously without API rate limits
Choose cloud APIs when:
- You need the latest models (GPT-4.5, Claude 3.7, etc.)
- Traffic is highly variable (paying per request vs idle GPU 24/7)
- You want zero infrastructure management
- You need enterprise SLAs and compliance certifications
My home lab runs Ollama on a used RTX 3090 I bought for about $600. Electricity costs about $15/month for 24/7 operation. At current API prices, that same workload would cost $80-200/month via OpenAI. The break-even point for local GPU deployment is roughly 3-6 months of heavy use.
For those interested in exploring both approaches, MiniMax offers API access with competitive pricing that balances local control and managed convenience:
👉 Join now: https://platform.minimaxi.com/subscribe/token-plan?code=E5yur9NOub&source=link
---
Quick Checklist
| Check | Command |
|---|---|
| View logs (Linux systemd) | journalctl -u ollama --no-pager --pager-end |
| View logs (Mac) | cat ~/.ollama/logs/server.log |
| View logs (Docker) | docker logs |
| Confirm GPU visibility | nvidia-smi |
| Check ROCm version | dpkg -l grep rocm |
| View LLM library loading | grep "Dynamic LLM libraries" ~/.ollama/logs/server.log |
| Force specific LLM library | OLLAMA_LLM_LIBRARY=cpu_avx2 ollama serve |
---
Key Data Points (Verified)
🔗 Related Tech Articles
Deep dive into related technical topics: