Glossary · Term

SRFT

← all terms

Definition

A mixed-policy reasoning post-training method.

Self-Refinement Fine-Tuning, a mixed-policy post-training method blending supervised and on-policy data, used as a baseline in math-reasoning post-training comparisons where headline gains turned out to depend on infrastructure bugs.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers