Definition
During training, feeding the model the correct earlier tokens instead of letting it use its own predictions.
A training strategy where the model conditions on ground-truth previous tokens rather than its own outputs, enabling parallel training but creating exposure bias.