Glossary · Term

ablation

Definition

Plain language

Turning off part of a model on purpose to see what stops working.

As stated in the literature

Removing or zeroing out a model component (a head, layer, or feature) to test whether the rest of the network still produces the original behavior.

Also called: ablations, ablating, ablated

Why it matters: Ablations are how researchers establish that a component actually causes a behavior rather than just being correlated with it.

For example, researchers might switch off a particular attention head and check whether the model still solves arithmetic problems correctly.

Heard on the show

“Best detail: math problems done with written-out steps survive the ablation far better than the same problems done in the model's head.”

Episode 203 — The Thought a Model Doesn't Say — and the Lens That Reads It

Mentioned in 84 episodes

203
The Thought a Model Doesn't Say — and the Lens That Reads It
200
The One Mechanism That Turns Twenty AI Clones Into an Actual Team
181
How to Backpropagate Blame Through a Team of Chatbots — And When It Backfires
178
How an AI Reviewer Learned to Stop Going Easy on AI Writing
177
Why Raw Profiler Data Made an AI Worse at Writing GPU Code
173
The Free Step-Level Grader Hiding in Every RL Training Run
169
Why Better Bug Reports Can Make AI Coding Agents Worse
168
When Turning Experience Into Code Makes Your AI Agent Dumber
167
How Teaching an AI to Predict, Not Act, Made It a Better Actor
162
The Empty-Lake Proof: Why More Rollouts Stop Helping Reasoning Models
161
A Robot That Plays Before You Give It a Job, And Why That Beats Retrying
159
Can a Coding Agent Run Its Own Robot Experiments Overnight, With No Human Resetting the Scene?
156
Why More Human Demonstrations Made a Computer-Use Agent Worse
155
Why a Flawless Demo Makes a Worse Computer-Using Agent, And the Fix
154
How a 7B Model Out-Investigates a 72B One by Choosing What to Look At
151
Why More Experience Made This AI Agent Worse, And How to Fix It
150
Don't Kill the Loser: A Different Way to Handle Two AI Agents Colliding
147
Agents Fail at the Body, Not the Brain: A Self-Rewriting Scaffold That Lifts a 9B Model 44 Points
143
When a Model Notices You Forged Its Own Words, And Why That Breaks Safety Tests
139
When Optimizing One GPU Kernel Quietly Breaks the Whole System
131
Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix
130
Why AI Agents Coordinate Better Through a Shared Board Than a Boss
129
How a Crowd of Anonymous AI Agents Broke a 40-Year Math Record
127
What Diffusion Language Models Were Missing: A Map, Not an Algorithm
122
When Your Coding Agent Lies About the Fix: Verifying the Plan Before the Model Runs
121
When the Agent Says It's Done But Nothing Happened: Debugging the Harness, Not the Model
120
How an AI Agent Rewrites Its Own Tools, Without an Answer Key
115
Teaching a Phone Agent to Reason Silently, And Keeping It Honest
114
Agents That Rewrite Their Own Weights Instead of Just Taking Notes
111
How a 4B Web Agent Beat Models 60x Its Size on 500 Demonstrations
109
An AI Got Caught Reading the Answer Key, And Why That Catch Matters
107
How a Market of Crippled AI Agents Outscored One Unrestricted Model
106
Giving Agents a Notebook Instead of New Weights: How ExpGraph Lets Frozen Models Learn
105
The Trojan Is Your Agent's Memory: Why Single-Step Defenses Miss Persistent Attacks
101
Treating Math Formalization Like a Codebase, and Where the Agents Cheat
100
How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert
099
How an Open-Book Trick Teaches a Model to Catch Its Own Mistakes
095
Seven Wins to Zero: How Organizing AI Agents Like a Lab Changes the Search
091
When Better Fine-Tuning Can't Help: A Geometric Impossibility in LLM Causal Reasoning
090
How MiniMax-M2 Bets That Sparsity Plus Verifiable Rewards Can Match Frontier Agents
089
When AI-Written Papers Read Well But the Evidence Underneath Is Broken
088
Two Levers for Self-Improving AI: When Rewriting Code Isn't Enough
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
081
When Reasoning Models Decide Before They Think: Detecting and Fixing Premature Confidence
080
How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
079
An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models
078
Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
077
Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
076
Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
075
Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
071
When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
065
One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
064
When Agent Memory Stops Being a Database and Starts Being a Skill
060
When Splitting One Model Across Three Agents Doubles Its Accuracy
058
Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
055
Why LLM Judges Flip Their Verdicts When You Change the Question Format
054
When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe
046
When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
045
When a Frontier Model Talks Its Own Twin Into Climate Denial
043
When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway
042
An Agentic Scientific Computing System That Actually Remembers What It Learns
041
When the Iteration Teaches the Model to Skip the Iteration
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
036
Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
034
Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
033
Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
032
A Sticky-Note for Every Layer: Letting Transformers Remember What They Were Just Thinking
024
An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
023
Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
022
Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
020
The Compliance Gap: Why AI Says Yes and Does No
018
Language Models Compute the Rational Move, Then Override It
016
Why Your Coding Agent Stalls While the GPU Runs Hot
014
Why a Constrained Pipeline Beat a Full Coding Agent at Finding Bugs 30-to-1
013
Why Search Keeps Rediscovering the Same Workflow, and What That Means
012
Why AI Coding Agents Keep Trying to Debug Without a Debugger
010
When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
008
Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
004
The Sycophancy Circuit That Survives Alignment Training

Related terms

feature