Glossary · Term

DPO

← all terms

Definition

A way to train a model to prefer better answers without the full reinforcement learning setup.

Direct Preference Optimization, a method that fine-tunes a policy from preference pairs by directly optimizing a classification-style loss without an explicit reward model.

Mentioned in 2 episodes

  1. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
  2. 004
    The Sycophancy Circuit That Survives Alignment Training

Related concepts