Definition
Salvaging failed AI training trajectories by training only on the parts where the agent was actually making progress.
A supervised fine-tuning method that uses an LLM to estimate per-step value along failed teacher trajectories and trains the student only on segments preceding the critical mistake.