LangChain is the most widely used framework for building LLM-powered applications. Combining it with Ollama gives you a fully local, private AI pipeline — no API keys, no data leaving your machine, no per-token costs. This guide covers the essentials: basic calls, chains, and a working RAG pipeline.
Prerequisites
pip install langchain langchain-ollama langchain-community chromadb
Make sure Ollama is running with at least one model pulled:
ollama pull llama3.1
ollama pull nomic-embed-text # for embeddings / RAG
Basic Chat with LangChain and Ollama
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1")
response = llm.invoke("What is the difference between RAM and storage?")
print(response.content)
Streaming
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1")
for chunk in llm.stream("Explain Docker in simple terms."):
print(chunk.content, end="", flush=True)
Prompt Templates
Use prompt templates to keep your prompts reusable and structured:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOllama(model="llama3.1")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that explains technical concepts simply."),
("human", "Explain {concept} as if I'm a complete beginner.")
])
chain = prompt | llm
response = chain.invoke({"concept": "vector databases"})
print(response.content)
Building a RAG Pipeline
Retrieval-Augmented Generation (RAG) lets your model answer questions based on your own documents. Here’s a complete working example using a local vector store.
For model recommendations for RAG, see the best Ollama models for RAG.
Step 1: Load and split documents
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load a single file
loader = TextLoader("my_document.txt")
docs = loader.load()
# Or load all .txt files from a folder
# loader = DirectoryLoader("./docs", glob="**/*.txt")
# docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
Step 2: Create embeddings and vector store
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
Step 3: Build the retrieval chain
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
llm = ChatOllama(model="llama3.1")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the context below.
If you don't know the answer from the context, say so.
Context: {context}
Question: {question}
""")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = chain.invoke("What are the main topics covered in the document?")
print(answer)
Step 4: Load an existing vector store
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Load previously persisted store
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=embeddings
)
Conversation Memory
Add memory to maintain context across multiple questions:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
llm = ChatOllama(model="llama3.1")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain = prompt | llm
store = {}
def get_session_history(session_id):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
config = {"configurable": {"session_id": "user_1"}}
print(chain_with_history.invoke({"input": "My name is Alice."}, config=config).content)
print(chain_with_history.invoke({"input": "What's my name?"}, config=config).content)
Choosing Models for LangChain + Ollama
- Chat / reasoning:
llama3.1ormistral— see Llama 3 vs Mistral comparison - Embeddings:
nomic-embed-textormxbai-embed-large - Code generation:
deepseek-coder-v2— see best models for coding
Next Steps
Once you have a working LangChain + Ollama setup, consider packaging it with Docker for portable deployment, or building a web interface using FastAPI and the Ollama Python library.


