Definition
Reaching inside a model and swapping a piece of its internal state to see whether that piece is actually causing a behavior.
An umbrella term for interventions like activation patching and path patching that substitute internal model state from one run into another to establish causal rather than correlational claims about components.