Glossary · Term

steering vector

← all terms

Definition

A direction in a model's internal state that, when added in, pushes the model toward a particular behavior.

A direction in activation space added to the residual stream during inference to bias the model toward a target behavior without retraining.

Also called: steering vectors

Related concepts