Need to summarise long documents, articles, or meeting notes locally? These are the best Ollama models for summarisation in 2026 — tested for accuracy, speed, and instruction-following ability.
What Makes a Good Summarisation Model?
Not all language models handle summarisation equally. The best models for this task share a few key traits: strong instruction-following, large context windows to handle long inputs, and the ability to extract key information without hallucinating details.
Top Ollama Models for Summarisation
1. Llama 3.1 8B — Best Overall
Meta’s Llama 3.1 8B is the go-to model for summarisation tasks on Ollama. It handles long documents well, follows instructions precisely, and produces clean, concise summaries without adding information that wasn’t in the source text.
ollama run llama3.1
Best for: General documents, articles, reports
Context window: 128K tokens
RAM required: 8GB minimum
2. Mistral 7B — Fastest Option
Mistral 7B is slightly faster than Llama 3.1 at comparable quality. If you’re summarising large volumes of text and speed matters, Mistral is an excellent choice. It produces tight, accurate summaries and handles structured documents particularly well.
ollama run mistral
Best for: High-volume summarisation, batch processing
Context window: 32K tokens
RAM required: 8GB minimum
3. Qwen2.5 14B — Best for Long Documents
Qwen2.5 14B excels with very long documents. Its large context window and strong instruction-following make it ideal for summarising lengthy reports, legal documents, or research papers in a single pass.
ollama run qwen2.5:14b
Best for: Long documents, technical content
Context window: 128K tokens
RAM required: 16GB minimum
4. Phi-4 — Best for Low-Resource Machines
Microsoft’s Phi-4 punches well above its weight for summarisation. If you’re running on a machine with limited RAM, Phi-4 delivers surprisingly strong summaries with a much smaller footprint than larger models.
ollama run phi4
Best for: Low RAM setups, quick summaries
Context window: 16K tokens
RAM required: 6GB minimum
5. Gemma 2 9B — Best for Bullet Point Summaries
Google’s Gemma 2 9B follows formatting instructions exceptionally well. Ask it for a bullet-point summary or an executive summary and it consistently delivers clean, structured output — perfect for business use cases.
ollama run gemma2:9b
Best for: Structured summaries, business documents
Context window: 8K tokens
RAM required: 10GB minimum
Quick Comparison
| Model | Speed | Quality | Context | RAM |
|---|---|---|---|---|
| Llama 3.1 8B | Fast | Excellent | 128K | 8GB |
| Mistral 7B | Very Fast | Good | 32K | 8GB |
| Qwen2.5 14B | Medium | Excellent | 128K | 16GB |
| Phi-4 | Fast | Good | 16K | 6GB |
| Gemma 2 9B | Fast | Very Good | 8K | 10GB |
How to Get Better Summaries from Any Model
Regardless of which model you choose, prompting makes a big difference. Try being specific about the output format you want:
Summarise the following text in 3 bullet points. Focus on the key findings and any action items. Text: [your text here]
Specifying length, format, and focus area consistently improves output quality across all models.
Our Recommendation
For most users, Llama 3.1 8B is the best starting point — it balances quality, speed, and hardware requirements well. If you regularly work with very long documents, upgrade to Qwen2.5 14B. If you’re on a low-spec machine, Phi-4 won’t let you down.
For more Ollama guides, see our complete Ollama help centre.


