Definition
The running internal state inside a transformer that each layer reads from and writes to.
The persistent token-position vector inside a transformer, accumulating writes from attention and MLP layers and feeding subsequent computation.
The running internal state inside a transformer that each layer reads from and writes to.
The persistent token-position vector inside a transformer, accumulating writes from attention and MLP layers and feeding subsequent computation.