Chatbot Arena ELO score. Crowd-sourced human preference ranking.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-05-21
Context Window
1.0M
tokens
Input Cost
$1.50
per 1M tokens
Output Cost
$9.00
per 1M tokens
Parameters
Unknown
model footprint
Performance Analysis // Verified Benchmarks
Chatbot Arena ELO score. Crowd-sourced human preference ranking.
WebDev Arena ELO score. Human preference ranking for web development tasks.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
Humanity's Last Exam - Hard reasoning benchmark without tools.
Graduate-Level Google-Proof Q&A Benchmark.
Abstraction and Reasoning Corpus - Level 2 (Extreme difficulty).
Professional level MMMU expansion.
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Verified desktop computer-use benchmark for end-to-end task completion.
Higher-difficulty SWE-bench subset for frontier coding agents.
Multi-step workflows using Model Context Protocol.
Financial analysis and reasoning benchmark for agentic workflows.