Glossary · Term

influence function

Definition

Plain language

A statistical tool for estimating how a model's predictions would shift if a particular training example had been weighted differently.

As stated in the literature

A classical statistics tool that estimates the change in model parameters or predictions due to perturbing a single training point's weight.

Also called: influence functions

Why it matters: It's a key tool for data attribution, debugging dataset issues, and reasoning about how individual training examples shape model behavior.

For example, you can use an influence function to estimate which training documents most pushed a model toward a particular wrong answer, without retraining the model from scratch.

Heard on the show

“… kind of object that doesn't mean anything intuitive on its face — and they rewrite it using influence functions. …”

Episode 025 — The Missing Gradient Term That Predicts Sycophancy in RLHF

Mentioned in 1 episode

025
The Missing Gradient Term That Predicts Sycophancy in RLHF

Related terms

parameter weights