Glossary · Term

agent

← all terms

Definition

An AI system that takes actions in a loop — picking what to do, doing it, seeing the result, and continuing — rather than just answering a single question.

In modern LLM usage, a system that wraps a model in an action-observation loop with tool use, often via ReAct-style scaffolding for long-horizon tasks.

Also called: agents, agentic

Mentioned in 59 episodes

  1. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
  2. 076
    Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
  3. 075
    Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year
  4. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  5. 072
    A Robot Made Graphene Without Help, And Caught Itself Hallucinating
  6. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
  7. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
  8. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
  9. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  10. 065
    One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
  11. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  12. 063
    Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
  13. 062
    Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
  14. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  15. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
  16. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
  17. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
  18. 057
    How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
  19. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  20. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  21. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
  22. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
  23. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  24. 046
    When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
  25. 045
    When a Frontier Model Talks Its Own Twin Into Climate Denial
  26. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
  27. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
  28. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
  29. 039
    When Smarter Agents Get Fooled by Three Extra Nodes in a Database
  30. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
  31. 035
    Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
  32. 034
    Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
  33. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval
  34. 031
    When Your AI Assistant Won't Let Go of Old Facts About You
  35. 030
    Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different Trap
  36. 029
    Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
  37. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
  38. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
  39. 026
    What RL Actually Does to Language Models, at the Token Level
  40. 024
    An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
  41. 023
    Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
  42. 022
    Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
  43. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
  44. 020
    The Compliance Gap: Why AI Says Yes and Does No
  45. 018
    Language Models Compute the Rational Move, Then Override It
  46. 017
    When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
  47. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot
  48. 015
    The Audit Number Isn't What You Think: Sycophancy and the Case Against Single-Prompt Bias Tests
  49. 014
    Why a Constrained Pipeline Beat a Full Coding Agent at Finding Bugs 30-to-1
  50. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
  51. 012
    Why AI Coding Agents Keep Trying to Debug Without a Debugger
  52. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
  53. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
  54. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
  55. 006
    What Happens Inside Claude When It Decides to Blackmail Someone
  56. 005
    Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
  57. 003
    How to Pick the Best of Sixteen Coding Agent Rollouts
  58. 002
    An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light
  59. 001
    When AI Models Quietly Protect Each Other From Shutdown

Related concepts