Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-20
Context Window
200k
tokens
Input Cost
$0.30
per 1M tokens
Output Cost
$2.40
per 1M tokens
Cache Cost
$0.03 / Free
read / write per 1M
Parameters
230B MoE (10B active)
model footprint
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Contamination-free, continuously updated reasoning benchmark.
Artificial Analysis aggregate intelligence index.
Humanity's Last Exam - Hard reasoning benchmark without tools.
Artificial Analysis aggregate coding capability index.
Graduate-Level Google-Proof Q&A Benchmark.
Artificial Analysis Long Context Reasoning benchmark. Evaluates reasoning over long contexts.
Artificial Analysis IFBench. Evaluates precise instruction following with constraints.
American Invitational Mathematics Examination 2025 problems.
Hard split of Terminal-Bench focused on tougher terminal workflows.
Higher-difficulty SWE-bench subset for frontier coding agents.
Telecom-domain tool-use and workflow benchmark.
Scientific programming benchmark for code synthesis and correctness.