Models / DeepSeek

DeepSeek-V4-Flash

Preview

Smaller/cheaper V4 model (~284B total / ~13B active params per authoritative third-party reports). Context length 1M, max output 384K tokens. Input price $0.14/M cache-miss, $0.0028/M cache-hit; output $0.28/M (USD). Supports dual modes (Thinking / Non-Thinking), JSON output, tool calls, chat prefix completion; FIM completion is non-thinking-mode only. Concurrency limit 2500. The legacy aliases deepseek-chat and deepseek-reasoner currently route to this model (non-thinking / thinking respectively). Part of the 'DeepSeek V4 Preview' generation (released 2026-04-24), hence status=preview. Knowle

Provider
DeepSeek
Status
Preview
Input price
$0.14 / 1M tokens
Output price
$0.28 / 1M tokens
Cached input
$0.003 / 1M tokens
Blended price
$0.175 / 1M tokens
Context window
1,000,000 tokens (1M)
Max output
384,000 tokens
Modality
text
Knowledge cutoff
Released
24 Apr 2026
API string
deepseek-v4-flash

Source: DeepSeek official documentation ↗