Beta version: *Information might not be fully accurate. Please report any discrepancies.

DeepSeekVerifiedOpen Weights21 benchmarks

DeepSeek-V3.2-Speciale

Released 2025-12-01Dense Enhanced Architecture

Training: 2024-07

Verified Model Card

Latest Data

2026-02-20

Context Window

256k

tokens

Input Cost

$0.27

per 1M tokens

Output Cost

$0.41

per 1M tokens

Parameters

Dense Enhanced

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

88.5/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

SWE-bench VerifiedCoding

73.1/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

MMMU (Multimodal)Multimodal

74/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Multi-discipline Multimodal Understanding and Reasoning.

LiveBenchReasoning

62.2/ 100

Verified

Last Verified: 2026-02-20LiveBench

Contamination-free, continuously updated reasoning benchmark.

LMArena ELOReal-world

1423/ 1700

Verified

Last Verified: Unknown DateDeepSeek News

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

AA Intelligence IndexReal-world

34.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate intelligence index.

MMLU-ProScience

86.3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

HLEScience

26.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Humanity's Last Exam - Hard reasoning benchmark without tools.

AA Math IndexMath

96.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate math capability index.

MathArenaApexMath

76.5/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Competitive math arena for top-tier reasoning models.

CodeforcesCoding

2701/ 4000

Verified

Last Verified: Unknown DateDeepSeek News

Competitive programming rating based on problem solving.

LiveCodeBench v6Coding

89.6*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Contamination-free coding benchmark using recent problems.

AA Coding IndexCoding

37.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate coding capability index.

GPQA DiamondSTEM

87.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Graduate-Level Google-Proof Q&A Benchmark.

AA-LCRLong Context

59.3*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis Long Context Reasoning benchmark. Evaluates reasoning over long contexts.

IFBenchInstruction Following

63.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis IFBench. Evaluates precise instruction following with constraints.

AIME 2025Math

96.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

American Invitational Mathematics Examination 2025 problems.

Terminal-Bench 2.0Agentic

39.3/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Agent performance in realistic terminal workflows (v2.0 leaderboard).

Terminal-Bench HardAgentic

34.8*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Hard split of Terminal-Bench focused on tougher terminal workflows.

TAU-Bench TelecomAgentic

0*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Telecom-domain tool-use and workflow benchmark.

SciCodeAdvanced Tasks

44*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Scientific programming benchmark for code synthesis and correctness.