Definition
Swapping a piece of a model's internal state from one run into another to test what that piece does.
An interpretability technique that substitutes internal activations from one forward pass into another to test whether a specific component is causally responsible for a behavior.