Concept · 4 episode(s)

Math Benchmarks

← all concepts

Definition

Math benchmarks measure model performance on mathematical reasoning, from grade-school word problems (GSM8K) to Olympiad and research-level questions (MATH, FrontierMath). They’ve been one of the most active arenas of capability progress and a recurring case study in benchmark saturation.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.