Glossary · Term

ReasonMaxxer

← all terms

Definition

A cheap way to train math reasoning into a model by editing only the few tokens where it actually matters.

A post-training method that uses entropy-gated contrastive supervision on a small set of high-uncertainty token positions in failed rollouts, reproducing RL-style reasoning gains at roughly three orders of magnitude lower cost.

Mentioned in 1 episode

  1. 026
    What RL Actually Does to Language Models, at the Token Level