Definition
A mixed-policy reasoning post-training method.
Self-Refinement Fine-Tuning, a mixed-policy post-training method blending supervised and on-policy data, used as a baseline in math-reasoning post-training comparisons where headline gains turned out to depend on infrastructure bugs.