Guides / vision

Cheapest LLM for vision

Vision tasks send images (tokenized as input) and get back text, so input price leads. Only models that accept image input qualify. These are the cheapest generally-available vision-capable models, ranked by a typical workload.

The cheapest pickMinistral 3 3B
$1.00/mo for an image-understanding workload of ~20M input and ~5M output tokens a month · $0.04 in / $0.04 out per 1M · Mistral
The ranking

Cheapest models for vision

Monthly cost for an image-understanding workload of ~20M input and ~5M output tokens a month. Sorted cheapest first.

#ModelContextInput $/MOutput $/MMonthly cost
1Ministral 3 3B
Mistral
$0.04$0.04$1.00 ◎
2Ministral 3 8B
Mistral
256K$0.15$0.15$3.75
3Ministral 3 14B
Mistral
$0.2$0.2$5.00
4Mistral Small 4
Mistral
256K$0.15$0.6$6.00
5Mistral Large 3
Mistral
256K$0.5$1.50$17.50
6Grok Build 0.1
xAI
256K$1$2$30.00
7Grok 4.3
xAI
1M$1.25$2.50$37.50
8Grok 4.20 (0309) Reasoning
xAI
1M$1.25$2.50$37.50
9Grok 4.20 (0309) Non-Reasoning
xAI
1M$1.25$2.50$37.50
10Claude Haiku 4.5
Anthropic
200K$1$5$45.00
11Mistral Medium 3.5
Mistral
$1.50$7.50$67.50
12Claude Sonnet 4.6
Anthropic
1M$3$15$135

Estimate only; excludes prompt caching, batch discounts and free tiers. Different volumes change the ranking —run your own numbers. Prices verified against official docs · catalog updated 2026-06-28.

Methodology

We include only models that accept image input (image understanding, not image generation), then rank a 20M-in / 5M-out monthly mix. Image tokens land on the input side of the bill, so cheap input pricing is the biggest lever.

FAQ

Cheapest LLM for vision

What is the cheapest LLM for vision?

Ministral 3 3B (Mistral) is the cheapest generally-available model we track for vision, at $0.04 per 1M input tokens and $0.04 per 1M output tokens — about $1.00/month for an image-understanding workload of ~20M input and ~5M output tokens a month. Ministral 3 8B is the next cheapest at $3.75/month.

How is "cheapest for vision" calculated?

We price a representative monthly workload — an image-understanding workload of ~20M input and ~5M output tokens a month — against every generally-available model, then rank by total cost. Only models that accept image input qualify. All prices are USD per 1M tokens, sourced from official provider documentation.

Is the cheapest model always the right choice for vision?

No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own vision workload before committing. Cost is easy to measure — fit is not.