Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Hard scientific reasoning benchmark inspired by olympiad-level tasks.
Score Distribution