Challenging competition mathematics problems (AIME/IMO level).
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-20
Context Window
262k
tokens
Input Cost
$0.20
per 1M tokens
Output Cost
$0.88
per 1M tokens
Parameters
235B (22B active)
model footprint
1 Variants Available
Performance Analysis // Verified Benchmarks
Challenging competition mathematics problems (AIME/IMO level).
Functional correctness of synthesized programs from docstrings.
Contamination-free, continuously updated reasoning benchmark.
Chatbot Arena ELO score. Crowd-sourced human preference ranking.
Artificial Analysis aggregate intelligence index.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Humanity's Last Exam - Hard reasoning benchmark without tools.
Artificial Analysis aggregate math capability index.
Harvard-MIT Mathematics Tournament - High difficulty competition math.
International Mathematical Olympiad style answer-only benchmark.
Contamination-free coding benchmark using recent problems.
Artificial Analysis aggregate coding capability index.
Graduate-Level Google-Proof Q&A Benchmark.
Artificial Analysis Long Context Reasoning benchmark. Evaluates reasoning over long contexts.
Artificial Analysis IFBench. Evaluates precise instruction following with constraints.
American Invitational Mathematics Examination 2025 problems.
Hard split of Terminal-Bench focused on tougher terminal workflows.
Verified desktop computer-use benchmark for end-to-end task completion.
Browser-based autonomous task execution benchmark.
Telecom-domain tool-use and workflow benchmark.
Scientific programming benchmark for code synthesis and correctness.