Glossary · Term

influence function

← all terms

Definition

A statistical tool for estimating how a model's predictions would shift if a particular training example had been weighted differently.

A classical statistics tool that estimates the change in model parameters or predictions due to perturbing a single training point's weight.

Also called: influence functions

Mentioned in 1 episode

  1. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF