Glossary · Term

attention

← all terms

Definition

The mechanism a language model uses to decide which earlier words to pay close attention to when figuring out the next word.

A weighted-sum operation in a transformer that mixes information across tokens; weights come from learned dot products between query and key vectors. The defining building block of every modern LLM.

Also called: Attention

Mentioned in 28 episodes

  1. 077
    Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
  2. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
  3. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  4. 072
    A Robot Made Graphene Without Help, And Caught Itself Hallucinating
  5. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
  6. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  7. 063
    Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
  8. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  9. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
  10. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  11. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  12. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
  13. 045
    When a Frontier Model Talks Its Own Twin Into Climate Denial
  14. 041
    When the Iteration Teaches the Model to Skip the Iteration
  15. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  16. 038
    How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
  17. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
  18. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
  19. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
  20. 031
    When Your AI Assistant Won't Let Go of Old Facts About You
  21. 029
    Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
  22. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
  23. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF
  24. 023
    Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
  25. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot
  26. 012
    Why AI Coding Agents Keep Trying to Debug Without a Debugger
  27. 006
    What Happens Inside Claude When It Decides to Blackmail Someone
  28. 002
    An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light

Related concepts