Definition
Training a smaller model to imitate a bigger one, hoping to inherit much of its skill.
A training procedure that transfers behavior from a teacher model to a smaller student by training the student to match the teacher's outputs or intermediate signals.
Also called: distill, distilled, self-distillation, distilling