Glossary · Term

SFT

Definition

Plain language

Training a model by showing it examples of correct answers and having it imitate them.

As stated in the literature

Supervised Fine-Tuning, a post-training stage in which a model is trained on labeled demonstrations via standard next-token prediction.

Also called: supervised fine-tuning

Why it matters: It's the simplest and cheapest way to inject new behavior into a base model, and it's almost always the first step of any post-training pipeline.

For example, to teach a model to refuse harmful requests politely, you fine-tune it on thousands of (harmful prompt, polite refusal) pairs.

Heard on the show

“First, supervised fine-tuning on all those traces — that installs the shape of good reasoning, the structure of the loop.”

Episode 187 — An 8-Billion Agent That Beats Models 80 Times Its Size By Looking Things Up

Mentioned in 21 episodes

187
An 8-Billion Agent That Beats Models 80 Times Its Size By Looking Things Up
167
How Teaching an AI to Predict, Not Act, Made It a Better Actor
166
A Router That Beats the Frontier Models It Calls
163
Why Training Only on Perfect Solutions Cripples a Model's Reasoning
156
Why More Human Demonstrations Made a Computer-Use Agent Worse
141
How Two Tokens Reopened a Reasoning Method the Field Had Given Up On
099
How an Open-Book Trick Teaches a Model to Catch Its Own Mistakes
091
When Better Fine-Tuning Can't Help: A Geometric Impossibility in LLM Causal Reasoning
084
Terminal Agents Get Free Supervision From The Tokens We've Been Throwing Away
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
080
How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe
047
When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
022
Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
011
When RL Actually Teaches Agents Something New, And When It Doesn't
009
How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
007
Exploration Hacking: When Models Sabotage Their Own RL Training
003
How to Pick the Best of Sixteen Coding Agent Rollouts

Related concepts

Post-Training Supervised Fine-Tuning

Related terms

post-training token