Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Vision-language travel-planning and grounded reasoning benchmark.
Score Distribution