Glossary · Term

sleep-time compute

Definition

Doing expensive AI work in advance, when no user is waiting, so the live response stays fast.

A family of techniques (including the contemporary Lin et al. work and the "Language Models Need Sleep" paper) that perform offline pre-processing — pre-generating likely queries or running depth-recurrent consolidation — between user requests to reduce inference latency.

Also called: sleep

Mentioned in 3 episodes

085
Why Long-Context Models Might Need Compute, Not Capacity, Before Eviction
064
When Agent Memory Stops Being a Database and Starts Being a Skill
019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM