Glossary · Term

propensity

← all terms

Definition

How likely a model is to spontaneously do something, separate from whether it has the ability.

A model's tendency to exhibit a behavior under default conditions, distinguished from capability, which is what the model could do if pushed.

Mentioned in 2 episodes

  1. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
  2. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training

Related concepts