Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
Unknown
Context Window
256k
tokens
Input Cost
$0.15
per 1M tokens
Output Cost
$0.60
per 1M tokens
Parameters
MoE optimized
model footprint
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Functional correctness of synthesized programs from docstrings.
Chatbot Arena ELO score. Crowd-sourced human preference ranking.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
500-problem math benchmark for broad quantitative reasoning.
Contamination-free coding benchmark using recent problems.
Graduate-Level Google-Proof Q&A Benchmark.
Mathematical reasoning in visual contexts.
Document visual question answering on scanned and digital documents.