Definition
A direction inside a model that corresponds to a particular feeling, like fear or calm.
A linear direction in a transformer's residual stream encoding a specific affective concept, derivable from mean-difference over emotion-conditioned text and causally manipulable via activation steering.
Also called: emotion vectors