loss · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A number that measures how wrong a model's outputs are, which training tries to make smaller.

As stated in the literature

A scalar objective function quantifying the discrepancy between model predictions and targets; gradients of the loss drive parameter updates.

Why it matters: Loss is the dial the optimizer actually moves, so understanding what's in it is essential to understanding what a model is learning.

For example, a language model's cross-entropy loss of 2.3 on a held-out batch means it's assigning, on average, that much surprise to each true next token.

Heard on the show

“… Among the patches the search returns, one hacks at less than half the baseline rate with no loss in real task performance — while the top raw performer actually hacks a bit more, which is why the …”

Episode 199 — Finding a Model's Hidden Behaviors Without Knowing What You're Looking For

Mentioned in 39 episodes

Related concepts

AI Safety AIMD Congestion Control Contrastive Loss Flow Matching Representation Alignment Scaling Laws Self-Play / Self-Evolution TracIn

Related terms

gradient parameter