Definition
A small running summary inside a sequence model that gets updated as new tokens arrive.
A fixed-size recurrent state (matrix or vector) in state-space and linear-attention models that compresses sequence history through gated overwrites; contrasts with the lossless but growing KV cache of attention layers.
Also called: fast weights