Topic · 12 episodes across 5 reviews

Rethinking Attention, Memory, and Latent Compute

← all reviews

A run of architecture papers questioning the transformer's defaults: how it retrieves over long context, whether it needs a KV cache at all, and whether it should carry computation between tokens instead of rebuilding from scratch.

Covered in these reviews