Definition
Tracing which specific connections inside a model carry a particular behavior by intervening on them.
An interpretability method that intervenes on specific edges between components in a network to identify which causal pathways carry a behavior.