Beta version: *Information might not be fully accurate. Please report any discrepancies.

DeepSeekVerifiedOpen Weights14 benchmarks

DeepSeek-R1-Distill-Qwen-32B

Released 2025-01-2032B (Distilled) Architecture

Verified Model Card

Latest Data

2026-02-16

Context Window

128k

tokens

Input Cost

$0.07

per 1M tokens

Output Cost

$0.14

per 1M tokens

Parameters

32B (Distilled)

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

83/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

MATH (CoT)Math

88/ 100

Verified

Last Verified: Unknown DateDeepSeek News

Challenging competition mathematics problems (AIME/IMO level).

AIME 2024/25Math

68.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

American Invitational Mathematics Examination. Competition-level math.

AA Intelligence IndexReal-world

17.2*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate intelligence index.

MMLU-ProScience

73.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

HLEScience

5.5*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Humanity's Last Exam - Hard reasoning benchmark without tools.

AA Math IndexMath

63*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis aggregate math capability index.

MATH-500Math

94.1*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

500-problem math benchmark for broad quantitative reasoning.

LiveCodeBench v6Coding

27*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Contamination-free coding benchmark using recent problems.

GPQA DiamondSTEM

61.5*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Graduate-Level Google-Proof Q&A Benchmark.

AA-LCRLong Context

9.7*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis Long Context Reasoning benchmark. Evaluates reasoning over long contexts.

IFBenchInstruction Following

22.9*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Artificial Analysis IFBench. Evaluates precise instruction following with constraints.

AIME 2025Math

63*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

American Invitational Mathematics Examination 2025 problems.

SciCodeAdvanced Tasks

37.6*/ 100

Third-party

Last Verified: 2026-02-16Artificial Analysis (Independent)

Scientific programming benchmark for code synthesis and correctness.