Models / DeepSeek

DeepSeek-V4-Flash

Preview

Smaller/cheaper V4 model (~284B total / ~13B active params per authoritative third-party reports). Context length 1M, max output 384K tokens. Input price $0.14/M cache-miss, $0.0028/M cache-hit; output $0.28/M (USD). Supports dual modes (Thinking / Non-Thinking), JSON output, tool calls, chat prefix completion; FIM completion is non-thinking-mode only. Concurrency limit 2500. The legacy aliases deepseek-chat and deepseek-reasoner currently route to this model (non-thinking / thinking respectively). Part of the 'DeepSeek V4 Preview' generation (released 2026-04-24), hence status=preview. Knowle

Provider

DeepSeek

Status

Preview

Input price

$0.14 / 1M tokens

Output price

$0.28 / 1M tokens

Cached input

$0.003 / 1M tokens

Blended price

$0.175 / 1M tokens

Context window

1,000,000 tokens (1M)

Max output

384,000 tokens

Modality

text

Knowledge cutoff

—

Released

24 Apr 2026

API string

deepseek-v4-flash

Source: DeepSeek official documentation ↗

Compare DeepSeek-V4-Flash with…

DeepSeek-V4-Flash vs Claude Opus 4.8→

$0.175 vs $10 blended /M

DeepSeek-V4-Flash vs Claude Opus 4.7→

$0.175 vs $10 blended /M

DeepSeek-V4-Flash vs Claude Opus 4.6→

$0.175 vs $10 blended /M

DeepSeek-V4-Flash vs Claude Opus 4.5→

$0.175 vs $10 blended /M

DeepSeek-V4-Flash

Compare DeepSeek-V4-Flash with…

Track DeepSeek-V4-Flash price & status changes