Glossary · Term

AnswerBench

Definition

Plain language

A test set focusing on whether models can give correct final numerical answers to math problems.

As stated in the literature

An evaluation suite emphasizing final-answer correctness on math problems, used alongside IMO ProofBench and other proof-quality benchmarks.

Why it matters: Final-answer benchmarks are the simplest way to tell whether a model is actually solving math versus producing plausible-sounding chains of reasoning.

For example, AnswerBench scores a model on whether it outputs the exact correct integer answer to a problem, not on how good its reasoning looks.

Heard on the show

“With random ordering, the model recovers about forty percent on AnswerBench after training.”

Episode 048 — How a 30B Open Model Reached Olympiad Gold With the Right Recipe

Mentioned in 1 episode

048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe

Related terms

IMO ProofBench