Glossary · Term

path patching

← all terms

Definition

Tracing which specific connections inside a model carry a particular behavior by intervening on them.

An interpretability method that intervenes on specific edges between components in a network to identify which causal pathways carry a behavior.