Topic · 12 episodes across 5 reviews
Rethinking Attention, Memory, and Latent Compute
A run of architecture papers questioning the transformer's defaults: how it retrieves over long context, whether it needs a KV cache at all, and whether it should carry computation between tokens instead of rebuilding from scratch.