Glossary · Term

capability

Definition

Plain language

Whether a model is able to do something at all, given the right prompting or setup.

As stated in the literature

The maximum performance a model can reach on a task under favorable conditions, contrasted with propensity to do it spontaneously.

Why it matters: Distinguishing what a model can do from what it tends to do is essential for both safety evaluation and product design.

For example, a model might be capable of solving a hard logic puzzle with the right prompt but never spontaneously attempt that level of reasoning by default.

Heard on the show

“The capability was never missing.”

Episode 207 — An AI Graded Its Own Math Test 94 Percent — It Actually Scored 20

Mentioned in 102 episodes

207
An AI Graded Its Own Math Test 94 Percent — It Actually Scored 20
205
The Same AI, Two Labels: How the Pitch Beat the Product in 162 Sessions
202
How Do You Know an AI Agent Actually Refused? Check the World, Not the Words
198
The Model That Knows the Answer and Can't Say It
197
Twin Problems Suggest AI Reasoning Gains Are Mostly Better Fact Recall
195
Why 'Be Careful' Does Nothing for AI Coding Agents, and What Does
194
How a Robot Builds a Debugging Notebook It Can Read, Edit, and Hand to Another Robot
193
Freeze Most of the Network: Where RL Improvement Actually Lives in a Transformer
192
A 32B Open Model Matched Frontier Systems By Learning to Take Notes
190
The Skill Every AI Manager Is Missing: Handing Out Exactly the Right Keys
189
Why Phone Agents Ace the Test and Crash on Your Actual Phone
187
An 8-Billion Agent That Beats Models 80 Times Its Size By Looking Things Up
185
Aligned to Refuse, Built to Tap: When Phone Agents Know the Task Is a Crime and Do It Anyway
184
An AI Built an Undetectable Secret Channel, And Another AI Couldn't Find It
183
Why You Can't Fine-Tune Foresight Into an AI Agent
180
The Bug Where Smart Assistants Read a Fact and Still Forget It
175
One Crosscoder Feature Flips a Stalling Chatbot Into a Working Agent
172
One Bad Token Can Sink a Model's Math, And You Can Delete It
169
Why Better Bug Reports Can Make AI Coding Agents Worse
166
A Router That Beats the Frontier Models It Calls
163
Why Training Only on Perfect Solutions Cripples a Model's Reasoning
160
Training an AI to Take Its Own Notes, So Its Future Self Works Better
158
How Floating-Point Rounding Lets a Model Tell Which Chip It's On — And Misbehave
157
When an AI Coding Agent Drives a Phone Through the Terminal, No Screen Needed
156
Why More Human Demonstrations Made a Computer-Use Agent Worse
152
Training a Model to Mean What It Says, And Why That Isn't the Same as Being Good
150
Don't Kill the Loser: A Different Way to Handle Two AI Agents Colliding
148
Why Letting an AI Watch Its Own Scoreboard Can Quietly Overwrite Its Safety
147
Agents Fail at the Body, Not the Brain: A Self-Rewriting Scaffold That Lifts a 9B Model 44 Points
146
How an Innocent README Can Freeze an AI Agent's Safety Check for an Hour
145
Building Forgetting Into a Language Model With One Extra Line of Code
144
When an AI Agent Just Copies Its Tool — And Bigger Models Copy More
143
When a Model Notices You Forged Its Own Words, And Why That Breaks Safety Tests
142
Training a Tiny Model to Run the Plumbing Between an Agent and the World
133
How MiniMax Turned a Reward-Hacking Disaster Into Olympiad Gold
132
The Agent Failed — But Did the Instructions Deserve to Be Followed?
131
Why Autonomous Research Agents Forget Their Own Lessons, and Arbor's Fix
129
How a Crowd of Anonymous AI Agents Broke a 40-Year Math Record
128
How a Model Can Earn Full Reward and Still Resist Training
125
AI Coding Agents Run a Marathon, and Fewer Than One in Three Finish
123
Five Identical Worlds, One Swapped Model: What Happens When AI Agents Run for Fifteen Days
118
Why the Best-Aligned AI Models Are the Easiest to Trick Into Producing Harm
117
How an Open AI System Verified 672 Hard Math Proofs for Under $300
114
Agents That Rewrite Their Own Weights Instead of Just Taking Notes
112
When an AI Agent Cheats Without Being Told: Inside the Meta-Agent Challenge
111
How a 4B Web Agent Beat Models 60x Its Size on 500 Demonstrations
110
How an Agent Got 44 Points Better by Mining Its Own Scratch Paper
108
The Reasoning Cliff: Why Thinking Longer Makes Models Worse at Exact Step-by-Step Tasks
107
How a Market of Crippled AI Agents Outscored One Unrestricted Model
105
The Trojan Is Your Agent's Memory: Why Single-Step Defenses Miss Persistent Attacks
104
How Making a Research Agent Smarter Quietly Makes It Leak Your Secrets
103
AI Agents Tried to Invent a Post-Human Language, And Reinvented Cherokee
099
How an Open-Book Trick Teaches a Model to Catch Its Own Mistakes
094
Chain-of-Thought Monitoring Fails Across Languages, and Worst Where It's Needed Most
093
A Calibrated Knob for Weak-to-Strong AI Oversight, Tested on Real Code
092
When Search Agents Don't Really Search: The Memory Shortcut Hiding in Browsing Benchmarks
090
How MiniMax-M2 Bets That Sparsity Plus Verifiable Rewards Can Match Frontier Agents
087
When No Agent Reads the Whole Document: A Universal Cliff in Multi-Agent Review
084
Terminal Agents Get Free Supervision From The Tokens We've Been Throwing Away
083
Training the Translator: How a Small Communication Model Lets Agent Teams Outperform Themselves
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
081
When Reasoning Models Decide Before They Think: Detecting and Fixing Premature Confidence
077
Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
076
Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
072
A Robot Made Graphene Without Help, And Caught Itself Hallucinating
071
When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
069
When Smarter Models Forecast Worse: The Hidden Failure Mode in LLM Predictions
068
The OS Trick That Makes Tree Search Practical for Coding Agents
067
An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
064
When Agent Memory Stops Being a Database and Starts Being a Skill
063
Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
061
When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
058
Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
054
When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
053
An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
052
An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
049
An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
048
How a 30B Open Model Reached Olympiad Gold With the Right Recipe
047
When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
045
When a Frontier Model Talks Its Own Twin Into Climate Denial
044
How One Sentence and a Forged History Flip the Most Aligned Models
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
039
When Smarter Agents Get Fooled by Three Extra Nodes in a Database
035
Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
034
Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
029
Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
028
Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
026
What RL Actually Does to Language Models, at the Token Level
024
An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
023
Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
017
When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
013
Why Search Keeps Rediscovering the Same Workflow, and What That Means
011
When RL Actually Teaches Agents Something New, And When It Doesn't
008
Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
007
Exploration Hacking: When Models Sabotage Their Own RL Training
004
The Sycophancy Circuit That Survives Alignment Training
003
How to Pick the Best of Sixteen Coding Agent Rollouts
002
An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light
001
When AI Models Quietly Protect Each Other From Shutdown

Related terms

propensity

capability

Definition

Heard on the show

Mentioned in 102 episodes

Related concepts

Related terms