Beta version: *Information might not be fully accurate. Please report any discrepancies.
Cybersecurity-flavored coding benchmark in simulated environments.
Score Distribution
HumanEval
human-eval
SWE-bench Verified
swe-bench-verified
BigCodeBench
bigcodebench
OJBench (Python)
ojbench-python
Codeforces
codeforces
LiveCodeBench v6
livecodebench-v6