Definition
DeepSpeed is a distributed training library that makes training very large models tractable on commodity GPU clusters, primarily through ZeRO sharding of optimizer state, gradients, and parameters. It’s one of the standard ways teams actually get billion- and trillion-parameter models to fit and train.