Concept · 4 episode(s)

Hybrid SSM/Attention

← all concepts

Definition

Hybrid SSM–attention architectures interleave state-space-model layers (Mamba and friends) with transformer attention layers, hoping to combine SSMs’ cheap long-context handling with attention’s strong in-context retrieval. The empirical question is which task mix gets to keep the win.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.