Glossary · Term

LUFFY

← all terms

Definition

A mixed-policy training method that blends expert demonstrations with the model's own attempts in a single RL loop.

A mixed-policy post-training method that interleaves teacher demonstrations with policy rollouts inside a unified RL stage, used as a baseline in math-reasoning post-training comparisons.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers