Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Higher-difficulty SWE-bench subset for frontier coding agents.
Score Distribution