Glossary · Term

capability

← all terms

Definition

Whether a model is able to do something at all, given the right prompting or setup.

The maximum performance a model can reach on a task under favorable conditions, contrasted with propensity to do it spontaneously.

Mentioned in 40 episodes

  1. 077
    Reading a Model's Confidence Curve to Decide When Chain-of-Thought Is Worth It
  2. 076
    Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
  3. 072
    A Robot Made Graphene Without Help, And Caught Itself Hallucinating
  4. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
  5. 069
    When Smarter Models Forecast Worse: The Hidden Failure Mode in LLM Predictions
  6. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
  7. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
  8. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  9. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  10. 063
    Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
  11. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  12. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
  13. 057
    How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
  14. 054
    When Models Learn the Monitor Exists, the Reasoning Trace Stops Being a Window
  15. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  16. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  17. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
  18. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  19. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  20. 045
    When a Frontier Model Talks Its Own Twin Into Climate Denial
  21. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
  22. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  23. 039
    When Smarter Agents Get Fooled by Three Extra Nodes in a Database
  24. 035
    Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
  25. 034
    Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
  26. 029
    Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
  27. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
  28. 026
    What RL Actually Does to Language Models, at the Token Level
  29. 024
    An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
  30. 023
    Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
  31. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
  32. 017
    When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
  33. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
  34. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
  35. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
  36. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training
  37. 004
    The Sycophancy Circuit That Survives Alignment Training
  38. 003
    How to Pick the Best of Sixteen Coding Agent Rollouts
  39. 002
    An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light
  40. 001
    When AI Models Quietly Protect Each Other From Shutdown

Related concepts