Concept · 3 episode(s)

KL Divergence

Definition

KL divergence measures how far one probability distribution sits from another — asymmetrically, in nats or bits of surprise. It’s a foundational tool in ML for everything from VAEs to RLHF, where it’s used to keep a fine-tuned policy from drifting too far from a reference.

Episodes covering this

173
The Free Step-Level Grader Hiding in Every RL Training Run
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents
Oh, Li, Park et al. · University of Wisconsin–Madison·22 min·Jun 25, 2026
026
What RL Actually Does to Language Models, at the Token Level
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Akgül, Kannan, Neiswanger et al. · University of Southern California·24 min·May 08, 2026
010
When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
RAGEN-2: Reasoning Collapse in Agentic RL
Wang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026