Definition
A type of language model that maintains a fixed-size memory instead of attending to every previous token.
A state-space model architecture that processes sequences with selective recurrent state updates, offering linear-time inference and constant memory per layer.
Also called: Mamba-2, Mamba-3