Question 1

What is the cheapest LLM for RAG?

Accepted Answer

Llama 3.1 8B Instruct (Meta) is the cheapest generally-available model we track for RAG, at $0.02 per 1M input tokens and $0.03 per 1M output tokens — about $1.15/month for a RAG app stuffing ~50M input and ~5M output tokens a month. Amazon Nova Micro is the next cheapest at $2.45/month.

Question 2

How is "cheapest for RAG" calculated?

Accepted Answer

We price a representative monthly workload — a RAG app stuffing ~50M input and ~5M output tokens a month — against every generally-available model, then rank by total cost. Only models with at least a 128K-token context window are included. All prices are USD per 1M tokens, sourced from official provider documentation.

Question 3

Is the cheapest model always the right choice for RAG?

Accepted Answer

No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own RAG workload before committing. Cost is easy to measure — fit is not.

#	Model	Context	Input $/M	Output $/M	Monthly cost
1	Llama 3.1 8B Instruct Meta	128K	$0.02	$0.03	$1.15 ◎
2	Amazon Nova Micro Amazon	128K	$0.035	$0.14	$2.45
3	Command R7B Cohere	128K	$0.037	$0.15	$2.63
4	Amazon Nova Lite Amazon	300K	$0.06	$0.24	$4.20
5	Qwen-Flash Alibaba	1M	$0.05	$0.4	$4.50
6	Llama 4 Scout (17B-16E Instruct) Meta	10M	$0.1	$0.3	$6.50
7	Llama 3.3 70B Instruct Meta	128K	$0.1	$0.32	$6.60
8	Qwen3.5-Flash Alibaba	1M	$0.1	$0.4	$7.00
9	Ministral 3 8B Mistral	256K	$0.15	$0.15	$8.25
10	Llama 4 Maverick (17B-128E Instruct) Meta	1M	$0.15	$0.6	$10.50
11	Mistral Small 4 Mistral	256K	$0.15	$0.6	$10.50
12	Command R (08-2024) Cohere	128K	$0.15	$0.6	$10.50

Cheapest LLM for RAG

Cheapest models for RAG