Home / AI / Ollama / How to Use Ollama with LangChain

Ollama

How to Use Ollama with LangChain

2. Basic Chat with LangChain and Ollama

6. Step 1: Load and split documents

7. Step 2: Create embeddings and vector store

8. Step 3: Build the retrieval chain

9. Step 4: Load an existing vector store

11. Choosing Models for LangChain + Ollama

LangChain is the most widely used framework for building LLM-powered applications. Combining it with Ollama gives you a fully local, private AI pipeline — no API keys, no data leaving your machine, no per-token costs. This guide covers the essentials: basic calls, chains, and a working RAG pipeline.

Prerequisites

pip install langchain langchain-ollama langchain-community chromadb

Make sure Ollama is running with at least one model pulled:

ollama pull llama3.1
ollama pull nomic-embed-text  # for embeddings / RAG

Basic Chat with LangChain and Ollama

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1")

response = llm.invoke("What is the difference between RAM and storage?")
print(response.content)

Streaming

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1")

for chunk in llm.stream("Explain Docker in simple terms."):
    print(chunk.content, end="", flush=True)

Prompt Templates

Use prompt templates to keep your prompts reusable and structured:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3.1")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains technical concepts simply."),
    ("human", "Explain {concept} as if I'm a complete beginner.")
])

chain = prompt | llm

response = chain.invoke({"concept": "vector databases"})
print(response.content)

Building a RAG Pipeline

Retrieval-Augmented Generation (RAG) lets your model answer questions based on your own documents. Here’s a complete working example using a local vector store.

For model recommendations for RAG, see the best Ollama models for RAG.

Step 1: Load and split documents

from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load a single file
loader = TextLoader("my_document.txt")
docs = loader.load()

# Or load all .txt files from a folder
# loader = DirectoryLoader("./docs", glob="**/*.txt")
# docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

Step 2: Create embeddings and vector store

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

Step 3: Build the retrieval chain

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

llm = ChatOllama(model="llama3.1")

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the context below. 
If you don't know the answer from the context, say so.

Context: {context}

Question: {question}
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("What are the main topics covered in the document?")
print(answer)

Step 4: Load an existing vector store

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Load previously persisted store
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)

Conversation Memory

Add memory to maintain context across multiple questions:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOllama(model="llama3.1")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

config = {"configurable": {"session_id": "user_1"}}

print(chain_with_history.invoke({"input": "My name is Alice."}, config=config).content)
print(chain_with_history.invoke({"input": "What's my name?"}, config=config).content)

Choosing Models for LangChain + Ollama

Chat / reasoning: llama3.1 or mistral — see Llama 3 vs Mistral comparison
Embeddings: nomic-embed-text or mxbai-embed-large
Code generation: deepseek-coder-v2 — see best models for coding

Next Steps

Once you have a working LangChain + Ollama setup, consider packaging it with Docker for portable deployment, or building a web interface using FastAPI and the Ollama Python library.

How to Use Ollama with LangChain

Table of Contents

1. Prerequisites

2. Basic Chat with LangChain and Ollama

3. Streaming

4. Prompt Templates

5. Building a RAG Pipeline

6. Step 1: Load and split documents

7. Step 2: Create embeddings and vector store

8. Step 3: Build the retrieval chain

9. Step 4: Load an existing vector store

10. Conversation Memory

11. Choosing Models for LangChain + Ollama

12. Next Steps

Prerequisites

Basic Chat with LangChain and Ollama

Streaming

Prompt Templates

Building a RAG Pipeline

Step 1: Load and split documents

Step 2: Create embeddings and vector store

Step 3: Build the retrieval chain

Step 4: Load an existing vector store

Conversation Memory

Choosing Models for LangChain + Ollama

Next Steps

How to Run Ollama on a Raspberry Pi

How to Run DeepSeek R1 on Ollama

Leave a Reply Cancel reply

How to Use Ollama with LangChain

Table of Contents

Prerequisites

Basic Chat with LangChain and Ollama

Streaming

Prompt Templates

Building a RAG Pipeline

Step 1: Load and split documents

Step 2: Create embeddings and vector store

Step 3: Build the retrieval chain

Step 4: Load an existing vector store

Conversation Memory

Choosing Models for LangChain + Ollama

Next Steps

How to Run Ollama on a Raspberry Pi

How to Run DeepSeek R1 on Ollama

Sign Up For Daily Newsletter

Related Posts

Leave a Reply Cancel reply