Concept · 12 episode(s)

Supervised Fine-Tuning

← all concepts

Definition

SFT (Supervised Fine-Tuning) trains a pretrained model on (input, target output) pairs to teach a specific behavior or format. It’s the simplest post-training method and the first step in most modern alignment pipelines before any RL.

Episodes covering this

  1. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
    Wang, Liu, Wang et al. · Sapient Intelligence·21 min·May 24, 2026
  2. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
    Xu, Wen, Li · Peking University·23 min·May 22, 2026
  3. 070
    When Models Know the Answer But Say the Wrong Thing Anyway
    Yeom, Sok, Kim et al. · Graduate School of Data Science·22 min·May 22, 2026
  4. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
    Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
  5. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
    Li, Zhan, Zhang et al. · Shanghai AI Laboratory / The Chinese University of Hong Kong·31 min·May 16, 2026
  6. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
    Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
  7. 043
    When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway
    Mayne, McKinney, Dubiński et al. · University of Oxford·18 min·May 14, 2026
  8. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
    Aviss · Fifth Dimension·23 min·May 09, 2026
  9. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
    Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
  10. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM
    Li, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026
  11. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
    Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
  12. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
    Limozin, Durech, Hoefler et al. · ETH AI Center·23 min·May 02, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.