Glossary · Term

State Stream Transformer

← all terms

Definition

A Transformer variant that lets each layer keep a small persistent state across tokens.

SST, an architecture in which each layer maintains a per-layer sticky vector that is gated and blended into the layer's input at the next token.

Also called: SST

Mentioned in 1 episode

  1. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking