Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-18
Context Window
400k
tokens
Input Cost
$21.00
per 1M tokens
Output Cost
$168.00
per 1M tokens
Parameters
Unknown (MoE)
model footprint
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Challenging competition mathematics problems (AIME/IMO level).
Functional correctness of synthesized programs from docstrings.
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Chatbot Arena ELO score. Crowd-sourced human preference ranking.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Humanity's Last Exam - Hard reasoning benchmark without tools.
Graduate-Level Google-Proof Q&A Benchmark.
Abstraction and Reasoning Corpus - Level 1.
American Invitational Mathematics Examination 2025 problems.
Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).
Professional level MMMU expansion.
Chart-based reasoning from arXiv papers (Reasoning QA).
Higher-difficulty SWE-bench subset for frontier coding agents.
Video variant of MMMU for multimodal understanding and reasoning.