SU-01 · Glossary · AI Papers: A Deep Dive

Definition

Plain language

An open-source 30-billion-parameter model that achieves olympiad-gold-level math reasoning via a four-stage training recipe.

As stated in the literature

A 30B/3B-active mixture-of-experts reasoning model post-trained with reverse-perplexity SFT, coarse RL on verifiable answers, refined RL with proof-quality judging, and a long test-time scaling loop, reaching gold-medal-level performance on IMO 2025 and USAMO 2026.

Why it matters: It shows that olympiad-level mathematical reasoning is reachable with open weights and a public training recipe rather than only inside closed labs.

For example, SU-01 can produce a full written proof for a 2025 International Mathematical Olympiad problem at gold-medal quality.

Heard on the show

“SU-01 *without* TTS scores in bronze territory on IMO 2025 — about twenty-one out of forty-two.”

Episode 048 — How a 30B Open Model Reached Olympiad Gold With the Right Recipe

Mentioned in 1 episode

048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe

Related terms

IMO mixture-of-experts perplexity post-training reasoning model reinforcement learning SFT test-time scaling USAMO