Cheapest LLM for summarization
Summarization is almost pure input: you feed in long documents and get back a short digest. Input price and a roomy context window dominate. These models clear a 200K-token floor and rank cheapest first for that read-heavy load.
Cheapest models for summarization
Monthly cost for a summarizer reading ~80M input and writing ~4M output tokens a month. Sorted cheapest first.
| # | Model | Context | Input $/M | Output $/M | Monthly cost |
|---|---|---|---|---|---|
| 1 | Qwen-Flash Alibaba | 1M | $0.05 | $0.4 | $5.60 ◎ |
| 2 | Amazon Nova Lite Amazon | 300K | $0.06 | $0.24 | $5.76 |
| 3 | Llama 4 Scout (17B-16E Instruct) Meta | 10M | $0.1 | $0.3 | $9.20 |
| 4 | Qwen3.5-Flash Alibaba | 1M | $0.1 | $0.4 | $9.60 |
| 5 | Ministral 3 8B Mistral | 256K | $0.15 | $0.15 | $12.60 |
| 6 | Llama 4 Maverick (17B-128E Instruct) Meta | 1M | $0.15 | $0.6 | $14.40 |
| 7 | Mistral Small 4 Mistral | 256K | $0.15 | $0.6 | $14.40 |
| 8 | GPT-5.4 nano OpenAI | 400K | $0.2 | $1.25 | $21.00 |
| 9 | Gemini 3.1 Flash-Lite Google | 1.0M | $0.25 | $1.50 | $26.00 |
| 10 | Qwen3.6-Flash Alibaba | 1M | $0.25 | $1.50 | $26.00 |
| 11 | Amazon Nova 2 Lite Amazon | 1M | $0.3 | $2.50 | $34.00 |
| 12 | Qwen-Plus (Qwen3-series) Alibaba | 1M | $0.4 | $1.20 | $36.80 |
Estimate only; excludes prompt caching, batch discounts and free tiers. Different volumes change the ranking —run your own numbers. Prices verified against official docs · catalog updated 2026-06-28.
Summarization is the most lopsided workload — long source in, short summary out (~20:1). We rank an 80M-in / 4M-out monthly mix and require ≥200K context so whole documents fit in a single pass instead of being chunked.
Cheapest LLM for summarization
What is the cheapest LLM for summarization?
Qwen-Flash (Alibaba) is the cheapest generally-available model we track for summarization, at $0.05 per 1M input tokens and $0.4 per 1M output tokens — about $5.60/month for a summarizer reading ~80M input and writing ~4M output tokens a month. Amazon Nova Lite is the next cheapest at $5.76/month.
How is "cheapest for summarization" calculated?
We price a representative monthly workload — a summarizer reading ~80M input and writing ~4M output tokens a month — against every generally-available model, then rank by total cost. Only models with at least a 200K-token context window are included. All prices are USD per 1M tokens, sourced from official provider documentation.
Is the cheapest model always the right choice for summarization?
No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own summarization workload before committing. Cost is easy to measure — fit is not.
Get alerted when a cheaper model for summarization ships
New models, price cuts, and deprecations — a short email when something actually changes. No spam, unsubscribe anytime.
◎ You're on the watch list. We'll ping you the moment a model launches, changes price, or gets deprecated.
Free forever · powered by the same data on this page.