Glossary · Term

progressive confidence shaping

Definition

A training technique that rewards an AI for building up confidence gradually rather than locking in an answer too soon.

A confidence-trajectory-shaped reward term layered onto GRPO that penalizes front-loaded confidence (favoring monotonically increasing mid-chain commitment) to discourage premature confidence and improve faithfulness.

Mentioned in 1 episode

081
When Reasoning Models Decide Before They Think: Detecting and Fixing Premature Confidence