Glossary · Term

credit-assignment SFT

Definition

Plain language

Salvaging failed AI training trajectories by training only on the parts where the agent was actually making progress.

As stated in the literature

A supervised fine-tuning method that uses an LLM to estimate per-step value along failed teacher trajectories and trains the student only on segments preceding the critical mistake.

Why it matters: It rescues training signal from runs that would otherwise be thrown away, while keeping the student from imitating the steps that actually caused the failure.

For example, a teacher trajectory fails on turn 18, and the student is trained only on turns 1 through 14, where an LLM judge says the agent was still on track.

Heard on the show

“The first is called credit-assignment SFT, and it's about the problem of failed trajectories.”

Episode 047 — When Agent Benchmarks Lie: The Harness Problem in Open-Source AI

Mentioned in 1 episode

047
When Agent Benchmarks Lie: The Harness Problem in Open-Source AI

Related terms

SFT trajectory