Definition
A benchmark of ten genuinely hard math problems contributed by working research mathematicians.
A research-math benchmark of ten problems contributed by mathematicians including Dan Spielman, Martin Hairer, Andrew Blumberg, and Shmuel Weinberger, used to evaluate RMA, GPT-5.2R, and Aletheia.