Glossary · Term

SU-01

← all terms

Definition

An open-source 30-billion-parameter model that achieves olympiad-gold-level math reasoning via a four-stage training recipe.

A 30B/3B-active mixture-of-experts reasoning model post-trained with reverse-perplexity SFT, coarse RL on verifiable answers, refined RL with proof-quality judging, and a long test-time scaling loop, reaching gold-medal-level performance on IMO 2025 and USAMO 2026.

Mentioned in 1 episode

  1. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe