Definition
Emotion vectors are directions in a model’s residual stream that, when added or subtracted, shift its outputs toward or away from a particular emotional valence. They’re a special case of activation steering and a popular probe in interpretability work on affect and persona.