Glossary · Term

KV cache

← all terms

Definition

The model's short-term memory of everything it has read so far in the current conversation.

Stored attention keys and values from previous tokens that a transformer reuses during generation; its size grows linearly with context length and often dominates inference memory.

Also called: KV-cache, key-value cache, KV caches

Mentioned in 5 episodes

  1. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  2. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
  3. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
  4. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
  5. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot

Related concepts