MMLU (5-shot)Knowledge
90/ 100
Verified
Last Verified: Unknown DateOpenAI Blog
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-02-20
Context Window
128k
tokens
Input Cost
Free
per 1M tokens
Output Cost
Free
per 1M tokens
Parameters
120B
model footprint
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Challenging competition mathematics problems (AIME/IMO level).
Contamination-free, continuously updated reasoning benchmark.
Humanity's Last Exam - Hard reasoning benchmark without tools.
Graduate-Level Google-Proof Q&A Benchmark.
Artificial Analysis IFBench. Evaluates precise instruction following with constraints.