Concept · 5 episode(s)

Capability vs. Propensity

← all concepts

Definition

Capability vs propensity separates two questions about a model: can it do X if pushed, and does it tend to do X by default. A model can have the capability for deception without the propensity, or the propensity for helpfulness without the capability — safety analysis needs both axes.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.