Glossary · Term

ReLIFT

← all terms

Definition

A mixed-policy reasoning post-training method.

A reinforcement-learning-from-trajectory post-training method that mixes expert demonstrations with on-policy rollouts; used as a comparison baseline in math-reasoning post-training studies.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers