gradient · Glossary · AI Papers: A Deep Dive

Definition

Plain language

The direction to nudge a model's weights to make it do better next time.

As stated in the literature

The vector of partial derivatives of a loss with respect to parameters, used by optimizers to update weights during training.

Also called: gradients

Why it matters: It's the central signal that drives essentially all neural-network training, and nearly every optimization technique is about computing or shaping gradients better.

For example, after a model misclassifies a cat photo as a dog, the gradient tells the optimizer which weights to nudge up or down to make 'cat' more likely next time.

Heard on the show

“And the reason for buckets instead of a precise number is the part I'd underline — the thing consuming this reward is a language model, not a gradient.”

Episode 186 — How a Frozen Model Went From 2% to 77% on Physics Puzzles — Without Retraining

Mentioned in 55 episodes

Related concepts

Black-Box Optimization DeepSpeed Gradient Accumulation Loss Aggregation Policy Gradient Reward Variance Reward Variance

Related terms

loss parameter weights