Glossary · Term

gradient accumulation

← all terms

Definition

Adding up gradients from several small batches before updating the model, as a workaround when one big batch won't fit in memory.

A training technique that processes mini-batches sequentially and sums their gradients before a single optimizer step, simulating a larger effective batch on memory-constrained hardware.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers