Definition
Entropy regularization adds a term to an RL or training objective that rewards keeping the policy’s output distribution spread out, preventing premature collapse to a single confident behavior. It’s a standard way to keep exploration alive without resorting to explicit exploration bonuses.