Concept · 8 episode(s)

Transformer Attention

Definition

Transformer attention is the core operation of the architecture: every token computes a weighted average over the others, where the weights come from learned similarity between queries and keys. Everything else in a transformer block is wrapped around this one mechanism.

Episodes covering this

198
The Model That Knows the Answer and Can't Say It
Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale
Gollapudi, Gupta, Singhal et al. · UC Berkeley·17 min·Jul 03, 2026
145
Building Forgetting Into a Language Model With One Extra Line of Code
Natively Unlearnable Large Language Models
Ghosal, Maini, Raghunathan · Carnegie Mellon University·22 min·Jun 15, 2026
108
The Reasoning Cliff: Why Thinking Longer Makes Models Worse at Exact Step-by-Step Tasks
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
Guo, Wu, Yiu · The University of Hong Kong·32 min·Jun 03, 2026
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
HRM-Text: Efficient Pretraining Beyond Scaling
Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
041
When the Iteration Teaches the Model to Skip the Iteration
Solve the Loop: Attractor Models for Language and Reasoning
Fein-Ashley, Rashidinejad · University of Southern California·30 min·May 13, 2026
036
Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
Dehghankar, Asudeh · University of Illinois Chicago·24 min·May 11, 2026
033
Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
Sridhar, Johansen · California·24 min·May 11, 2026
002
An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light
End-to-end autonomous scientific discovery on a real optical platform
Yang, Chen, Zhao et al. · Zhejiang University·29 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.