Running Ollama in Docker lets you deploy local LLMs on any machine or server without installing anything directly on the host. It’s the cleanest approach for server deployments, CI pipelines, or anyone who wants a portable, reproducible AI environment.
Prerequisites
You’ll need Docker installed. For GPU support you’ll also need the NVIDIA Container Toolkit (NVIDIA) or ROCm (AMD). On Apple Silicon, Docker Desktop handles GPU via the Metal backend automatically.
Basic CPU Setup
Pull and run the official Ollama image:
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
Then pull a model into the running container:
docker exec -it ollama ollama pull llama3.1
Test it:
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.1", "prompt": "Hello!", "stream": false}'
NVIDIA GPU Setup
First install the NVIDIA Container Toolkit, then run:
docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
AMD GPU Setup
docker run -d \
--device /dev/kfd \
--device /dev/dri \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:rocm
Using Docker Compose
For a persistent, easily managed setup, use a docker-compose.yml:
CPU only
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
volumes:
ollama_data:
With NVIDIA GPU
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
ollama_data:
Start with:
docker compose up -d
Adding Open WebUI
Pair Ollama with Open WebUI for a ChatGPT-style interface in your browser. Add it to your compose file:
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- open_webui_data:/app/backend/data
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
open_webui_data:
Open WebUI will be available at http://localhost:3000.
Pre-pulling Models at Startup
To automatically pull models when the container starts, use an init script:
#!/bin/bash
# init-models.sh
ollama serve &
sleep 5
ollama pull llama3.1
ollama pull nomic-embed-text
wait
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
- ./init-models.sh:/init-models.sh
ports:
- "11434:11434"
entrypoint: ["/bin/bash", "/init-models.sh"]
restart: unless-stopped
volumes:
ollama_data:
Connecting From Other Containers
When other containers on the same Docker network need to call Ollama, use the service name as the hostname:
# From Python in another container on the same network
import ollama
client = ollama.Client(host='http://ollama:11434')
response = client.chat(
model='llama3.1',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])
Useful Docker Commands
# View logs
docker logs ollama
# List pulled models
docker exec ollama ollama list
# Pull a new model
docker exec ollama ollama pull mistral
# Remove a model to free disk space
docker exec ollama ollama rm mistral
# Stop and remove container (keeps volume data)
docker stop ollama && docker rm ollama
Choosing a Model
For server deployments, the right model depends on your hardware and use case. See the guides to best models for coding, best models for RAG, and best models for summarisation.
Next Steps
With Ollama running in Docker, the next logical step is calling it from Python to build applications, or using LangChain to build a RAG pipeline.


