Glossary · Term

progressive confidence shaping

← all terms

Definition

A training technique that rewards an AI for building up confidence gradually rather than locking in an answer too soon.

A confidence-trajectory-shaped reward term layered onto GRPO that penalizes front-loaded confidence (favoring monotonically increasing mid-chain commitment) to discourage premature confidence and improve faithfulness.

Mentioned in 1 episode

  1. 081
    When Reasoning Models Decide Before They Think: Detecting and Fixing Premature Confidence