Definition
The way instruction tuning makes a model's top choices more decisive and confident at each step.
The hypothesis that RLHF and instruction tuning concentrate probability mass on top tokens, unifying observed phenomena like alignment tax, calibration loss, mode collapse, and confident hallucination under a single dispositional change.