Glossary · Term

causal patching

← all terms

Definition

Reaching inside a model and swapping a piece of its internal state to see whether that piece is actually causing a behavior.

An umbrella term for interventions like activation patching and path patching that substitute internal model state from one run into another to establish causal rather than correlational claims about components.