Glossary · Term

First Proof

← all terms

Definition

A benchmark of ten genuinely hard math problems contributed by working research mathematicians.

A research-math benchmark of ten problems contributed by mathematicians including Dan Spielman, Martin Hairer, Andrew Blumberg, and Shmuel Weinberger, used to evaluate RMA, GPT-5.2R, and Aletheia.

Mentioned in 1 episode

  1. 076
    Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math