Home / AI / Ollama / How to Run Ollama with Docker

Ollama

How to Run Ollama with Docker

9. Pre-pulling Models at Startup

10. Connecting From Other Containers

Running Ollama in Docker lets you deploy local LLMs on any machine or server without installing anything directly on the host. It’s the cleanest approach for server deployments, CI pipelines, or anyone who wants a portable, reproducible AI environment.

Prerequisites

You’ll need Docker installed. For GPU support you’ll also need the NVIDIA Container Toolkit (NVIDIA) or ROCm (AMD). On Apple Silicon, Docker Desktop handles GPU via the Metal backend automatically.

Basic CPU Setup

Pull and run the official Ollama image:

docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Then pull a model into the running container:

docker exec -it ollama ollama pull llama3.1

Test it:

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "Hello!", "stream": false}'

NVIDIA GPU Setup

First install the NVIDIA Container Toolkit, then run:

docker run -d \
  --gpus=all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

AMD GPU Setup

docker run -d \
  --device /dev/kfd \
  --device /dev/dri \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:rocm

Using Docker Compose

For a persistent, easily managed setup, use a docker-compose.yml:

CPU only

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

volumes:
  ollama_data:

With NVIDIA GPU

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

Start with:

docker compose up -d

Adding Open WebUI

Pair Ollama with Open WebUI for a ChatGPT-style interface in your browser. Add it to your compose file:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open_webui_data:/app/backend/data
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  open_webui_data:

Open WebUI will be available at http://localhost:3000.

Pre-pulling Models at Startup

To automatically pull models when the container starts, use an init script:

#!/bin/bash
# init-models.sh
ollama serve &
sleep 5
ollama pull llama3.1
ollama pull nomic-embed-text
wait

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
      - ./init-models.sh:/init-models.sh
    ports:
      - "11434:11434"
    entrypoint: ["/bin/bash", "/init-models.sh"]
    restart: unless-stopped

volumes:
  ollama_data:

Connecting From Other Containers

When other containers on the same Docker network need to call Ollama, use the service name as the hostname:

# From Python in another container on the same network
import ollama

client = ollama.Client(host='http://ollama:11434')
response = client.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])

Useful Docker Commands

# View logs
docker logs ollama

# List pulled models
docker exec ollama ollama list

# Pull a new model
docker exec ollama ollama pull mistral

# Remove a model to free disk space
docker exec ollama ollama rm mistral

# Stop and remove container (keeps volume data)
docker stop ollama && docker rm ollama

Choosing a Model

For server deployments, the right model depends on your hardware and use case. See the guides to best models for coding, best models for RAG, and best models for summarisation.

Next Steps

With Ollama running in Docker, the next logical step is calling it from Python to build applications, or using LangChain to build a RAG pipeline.

How to Run Ollama with Docker

Table of Contents

1. Prerequisites

2. Basic CPU Setup

3. NVIDIA GPU Setup

4. AMD GPU Setup

5. Using Docker Compose

6. CPU only

7. With NVIDIA GPU

8. Adding Open WebUI

9. Pre-pulling Models at Startup

10. Connecting From Other Containers

11. Useful Docker Commands

12. Choosing a Model

13. Next Steps

Prerequisites

Basic CPU Setup

NVIDIA GPU Setup

AMD GPU Setup

Using Docker Compose

CPU only

With NVIDIA GPU

Adding Open WebUI

Pre-pulling Models at Startup

Connecting From Other Containers

Useful Docker Commands

Choosing a Model

Next Steps

How to Use Ollama with VS Code (Continue and Cline)

How to Run Ollama on a Raspberry Pi

Leave a Reply Cancel reply

How to Run Ollama with Docker

Table of Contents

Prerequisites

Basic CPU Setup

NVIDIA GPU Setup

AMD GPU Setup

Using Docker Compose

CPU only

With NVIDIA GPU

Adding Open WebUI

Pre-pulling Models at Startup

Connecting From Other Containers

Useful Docker Commands

Choosing a Model

Next Steps

How to Use Ollama with VS Code (Continue and Cline)

How to Run Ollama on a Raspberry Pi

Sign Up For Daily Newsletter

Related Posts

Leave a Reply Cancel reply