Definition
Adding up gradients from several small batches before updating the model, as a workaround when one big batch won't fit in memory.
A training technique that processes mini-batches sequentially and sums their gradients before a single optimizer step, simulating a larger effective batch on memory-constrained hardware.