Definition
Doing expensive AI work in advance, when no user is waiting, so the live response stays fast.
A family of techniques (including the contemporary Lin et al. work and the "Language Models Need Sleep" paper) that perform offline pre-processing — pre-generating likely queries or running depth-recurrent consolidation — between user requests to reduce inference latency.
Also called: sleep