Glossary · Term

FSDP

← all terms

Definition

A way to split a huge model across many GPUs so it fits in memory during training.

Fully Sharded Data Parallel, a distributed training scheme that shards parameters, gradients, and optimizer state across GPUs.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers