Concept · 1 episode(s)

Mixed-Policy Training

← all concepts

Definition

Mixed-policy training trains a model on trajectories drawn from more than one policy — the current model, an older checkpoint, a teacher, a human, a sampler — in a controlled mixture. It’s a way to balance exploration, stability, and quality during RL or rejection-sampling training.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.