Concept · 1 episode(s)

Emotion Vectors

← all concepts

Definition

Emotion vectors are directions in a model’s residual stream that, when added or subtracted, shift its outputs toward or away from a particular emotional valence. They’re a special case of activation steering and a popular probe in interpretability work on affect and persona.

Episodes covering this