Glossary · Term

CPU offloading

← all terms

Definition

Letting a training job stash bookkeeping data in regular system memory because the GPU's memory isn't big enough.

A DeepSpeed feature that moves optimizer state and gradients from GPU HBM to host CPU RAM to fit larger models on memory-constrained hardware; interaction with gradient accumulation is the source of a silent bug documented in the SFT-then-RL paper.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers