Beta version: *Information might not be fully accurate. Please report any discrepancies.
Cybersecurity-flavored coding benchmark in simulated environments.
Score Distribution
HumanEval
human-eval
SWE-bench Verified
swe-bench-verified
BigCodeBench
bigcodebench
Codeforces
codeforces
LiveCodeBench v6
livecodebench-v6
LiveCodeBench Pro
livecodebench-pro