Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-16
Context Window
32k
tokens
Input Cost
$0.60
per 1M tokens
Output Cost
$2.40
per 1M tokens
Parameters
132B (MoE)
model footprint
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Functional correctness of synthesized programs from docstrings.
Grade school math word problems requiring multi-step reasoning.
American Invitational Mathematics Examination. Competition-level math.
Artificial Analysis aggregate intelligence index.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Humanity's Last Exam - Hard reasoning benchmark without tools.
500-problem math benchmark for broad quantitative reasoning.
Contamination-free coding benchmark using recent problems.
Graduate-Level Google-Proof Q&A Benchmark.
Scientific programming benchmark for code synthesis and correctness.