Glossary · Term

IMO ProofBench

Definition

Plain language

A benchmark that grades the quality of an AI's mathematical proofs, not just whether the final answer is right.

As stated in the literature

An evaluation suite scoring full proof correctness and rigor on olympiad-style problems.

Why it matters: It distinguishes models that can actually reason through a proof from those that guess the right answer with hand-wavy justification.

For example, an AI's proof that 'looks right' but skips a key case would lose points on ProofBench even if its final conclusion happens to match the answer key.

Heard on the show

“… family of models that score ninety-five percent on answer-based math can score twenty percent on IMO ProofBench — a benchmark designed specifically to grade proof quality rather than final answers. …”

Episode 048 — How a 30B Open Model Reached Olympiad Gold With the Right Recipe

Mentioned in 1 episode

048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe