Definition
A Transformer variant that lets each layer keep a small persistent state across tokens.
SST, an architecture in which each layer maintains a per-layer sticky vector that is gated and blended into the layer's input at the next token.
Also called: SST