Agent performance in realistic terminal workflows (v2.0 leaderboard).
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2026-03-18
Context Window
197k
tokens
Input Cost
$0.30
per 1M tokens
Output Cost
$1.20
per 1M tokens
Parameters
229B MoE
model footprint
Performance Analysis // Verified Benchmarks
Agent performance in realistic terminal workflows (v2.0 leaderboard).
Multi-repository software engineering benchmark.
Higher-difficulty SWE-bench subset for frontier coding agents.
Software engineering performance across multilingual codebases.
Long horizon real-world software tasks.
High-level coding outcome quality benchmark for agent-driven development.
Natural language to repository-wide code edits benchmark.
Lite version of machine learning engineering benchmark measuring medal rate.
Multimodal agent benchmark for daily tasks.