Definition
A cheap way to train math reasoning into a model by editing only the few tokens where it actually matters.
A post-training method that uses entropy-gated contrastive supervision on a small set of high-uncertainty token positions in failed rollouts, reproducing RL-style reasoning gains at roughly three orders of magnitude lower cost.