Skip to content

Leaderboard Benchmarks Explore Compare API Methodology

Beta version: *Information might not be fully accurate. Please report any discrepancies.

LLM Registry

Independent source of truth for LLM benchmark scores with provenance tracking and normalized rankings.

Leaderboards

Global Leaderboard Compare Models Explore (Price vs Performance)All Benchmarks

Resources

Methodology API Documentation GitHub

Data Sources

Artificial Analysis Models.dev

© 2026 LLM Registry

Report Inaccuracies Star on GitHub

LeaderboardLlama 4 Behemoth

MetaVerifiedOpen Weights8 benchmarks

Llama 4 Behemoth

Released 2026-01-14Hybrid Dense/MoE Architecture

Verified Model Card

Latest Data

Unknown

Context Window

1.0M

tokens

Input Cost

$2.50

per 1M tokens

Output Cost

$7.50

per 1M tokens

Parameters

Hybrid Dense/MoE

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

MMLU (5-shot)Knowledge

85.8/ 100

Verified

Last Verified: Unknown DateMeta AI

Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.

84.5/ 100

Verified

Last Verified: Unknown DateMeta AI

Challenging competition mathematics problems (AIME/IMO level).

HumanEvalCoding

88.9/ 100

Verified

Last Verified: Unknown DateMeta AI

Functional correctness of synthesized programs from docstrings.

SWE-bench VerifiedCoding

48.2/ 100

Verified

Last Verified: Unknown DateMeta AI

Resolving real-world GitHub issues. Verified subset ensures solvable issues.

MMMU (Multimodal)Multimodal

76.1/ 100

Verified

Last Verified: Unknown DateMeta AI

Multi-discipline Multimodal Understanding and Reasoning.

LMArena ELOReal-world

1310/ 1700

Verified

Last Verified: Unknown DateMeta AI

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

MMLU-ProScience

82.2/ 100

Verified

Last Verified: Unknown DateMeta AI

A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.

GPQA DiamondSTEM

69.8/ 100

Verified

Last Verified: Unknown DateMeta AI

Graduate-Level Google-Proof Q&A Benchmark.

Metadata

License

Open Weights

Context Window

1,000,000 tokens

Input Pricing

$2.50 / 1M tokens

Output Pricing

$7.50 / 1M tokens

Modality

textcodevision

Report Inaccuracy

Compare With

o1 o1-preview o1-mini