Definition
An open-source 30-billion-parameter model that achieves olympiad-gold-level math reasoning via a four-stage training recipe.
A 30B/3B-active mixture-of-experts reasoning model post-trained with reverse-perplexity SFT, coarse RL on verifiable answers, refined RL with proof-quality judging, and a long test-time scaling loop, reaching gold-medal-level performance on IMO 2025 and USAMO 2026.