Glossary · Term

sleep-time compute

← all terms

Definition

Doing expensive AI work in advance, when no user is waiting, so the live response stays fast.

A family of techniques (including the contemporary Lin et al. work and the "Language Models Need Sleep" paper) that perform offline pre-processing — pre-generating likely queries or running depth-recurrent consolidation — between user requests to reduce inference latency.

Also called: sleep

Mentioned in 3 episodes

  1. 085
    Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
  2. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  3. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM