Glossary · Term

TracIn

← all terms

Definition

A method for figuring out which training examples were most responsible for a specific prediction.

A training-data attribution technique that estimates the influence of training examples on predictions by tracking gradient inner products across checkpoints.

Mentioned in 1 episode

  1. 025
    The Missing Gradient Term That Predicts Sycophancy in RLHF

Related concepts