Glossary · Term

Prefix-RFT

← all terms

Definition

A mixed-policy reasoning post-training method.

A reinforcement-from-trajectory variant that interleaves expert-prefix demonstrations with on-policy rollouts; one of several mixed-policy baselines whose claimed gains over SFT-then-RL turned out to be artifacts of training-pipeline bugs.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers