Beta version: *Information might not be fully accurate. Please report any discrepancies.
Online judge programming benchmark for Python.
Score Distribution
HumanEval
human-eval
SWE-bench Verified
swe-bench-verified
BigCodeBench
bigcodebench
LMArena WebDev ELO
lmarena-webdev-elo
Codeforces
codeforces
LiveCodeBench v6
livecodebench-v6