Beta version: *Information might not be fully accurate. Please report any discrepancies.

Zhipu AIVerifiedOpen Weights16 benchmarks

GLM 5.1

Released 2026-04-07744B total (40B active) Architecture

Verified Model Card

Latest Data

2026-05-21

Context Window

200k

tokens

Input Cost

$1.40

per 1M tokens

Output Cost

$4.40

per 1M tokens

Cache Cost

$0.26 / Free

read / write per 1M

Parameters

744B total (40B active)

model footprint

Benchmark Provenance

Performance Analysis // Verified Benchmarks

LMArena ELOReal-world

1472/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

Chatbot Arena ELO score. Crowd-sourced human preference ranking.

LMArena WebDev ELOCoding

1532/ 1700

Verified

Last Verified: 2026-05-21Chatbot Arena Leaderboard

WebDev Arena ELO score. Human preference ranking for web development tasks.

HLEScience

31/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Humanity's Last Exam - Hard reasoning benchmark without tools.

HLE-Full (w/ tools)Science

52.3/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Humanity's Last Exam full evaluation with tool access enabled.

AIME 2026Math

95.3/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Future prediction of AIME performance levels.

HMMT Nov 2025Math

94/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Harvard-MIT Mathematics Tournament November 2025 - High difficulty competition math.

HMMT Feb 2026Math

82.6/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Harvard-MIT Mathematics Tournament 2026 - High difficulty competition math.

IMO-AnswerBenchMath

83.8/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

International Mathematical Olympiad style answer-only benchmark.

CyberGymCoding

68.7/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Cybersecurity-flavored coding benchmark in simulated environments.

GPQA DiamondSTEM

86.2/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Graduate-Level Google-Proof Q&A Benchmark.

Terminal-Bench 2.0Agentic

63.5/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Agent performance in realistic terminal workflows (v2.0 leaderboard).

SWE-bench ProAgentic

58.4/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Higher-difficulty SWE-bench subset for frontier coding agents.

BrowseCompAgentic

68/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Web browsing + synthesis benchmark for research agents.

MCP AtlasAgentic

71.8/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Multi-step workflows using Model Context Protocol.

ToolathlonAgentic

40.7/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Long horizon real-world software tasks.

TAU-BenchAgentic

70.6/ 100

Verified

Last Verified: 2026-04-07GLM 5.1 Announcement

Tool-use and API orchestration benchmark for assistants.