Definition
A direction in a model's internal state that, when added in, pushes the model toward a particular behavior.
A direction in activation space added to the residual stream during inference to bias the model toward a target behavior without retraining.
Also called: steering vectors