Glossary · Term

fine-tuning

← all terms

Definition

Taking a model that already knows a lot and giving it more lessons so it gets better at one specific task.

Continuing the training of a pretrained model on a smaller task-specific dataset, updating its weights to adapt the general model to a particular use case.

Also called: Fine-tuning, fine-tune, fine-tuned, finetuning, finetune, finetuned, finetunes

Mentioned in 19 episodes

  1. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
  2. 074
    How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
  3. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
  4. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  5. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
  6. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  7. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  8. 043
    When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway
  9. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  10. 038
    How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
  11. 032
    A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
  12. 026
    What RL Actually Does to Language Models, at the Token Level
  13. 022
    Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
  14. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
  15. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM
  16. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers
  17. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
  18. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training
  19. 004
    The Sycophancy Circuit That Survives Alignment Training

Related concepts