TracIn · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A method for figuring out which training examples were most responsible for a specific prediction.

As stated in the literature

A training-data attribution technique that estimates the influence of training examples on predictions by tracking gradient inner products across checkpoints.

Why it matters: It supports debugging, data cleaning, and copyright attribution by linking model behavior back to specific training data.

For example, TracIn can be used to ask 'which training examples most pushed the model toward this hallucination?' and surface a handful of suspect documents.

Heard on the show

“It's exactly TracIn — a self-influence estimator that's been sitting in the data attribution literature since 2020.”

Episode 025 — The Missing Gradient Term That Predicts Sycophancy in RLHF

Mentioned in 1 episode

025
The Missing Gradient Term That Predicts Sycophancy in RLHF

Related concepts

TracIn

Related terms

checkpoint data attribution gradient