Theme · 41 episode(s)

Agentic AI

← all concepts

Definition

Agentic AI refers to AI systems that take goal-directed actions over multiple steps in some environment — calling tools, browsing the web, editing files — rather than producing a single response to a single prompt. The shift introduces a new class of risks around autonomy, long horizons, and irreversible actions.

Episodes covering this

  1. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
    Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
  2. 076
    Same Model, Organized Differently: How an Agent Architecture Beat Frontier Systems at Research Math
    Zhao, Yuan, Choi et al. · Georgia Institute of Technology·22 min·May 25, 2026
  3. 072
    A Robot Made Graphene Without Help, And Caught Itself Hallucinating
    Shi, Zheng, Juan et al. · Princeton University·29 min·May 23, 2026
  4. 071
    When the Model Is Fine and the Plumbing Is Broken: Fixing Agents at the Interface
    Xu, Wen, Li · Peking University·23 min·May 22, 2026
  5. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
    Dong, He, Hou et al. · Institute of Parallel and Distributed Systems·27 min·May 22, 2026
  6. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
    Tsoukalas, Kovsharov, Shirobokov et al. · Google DeepMind·31 min·May 22, 2026
  7. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
    Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
  8. 065
    One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery
    Agrawal, Lee, Tan et al. · UC Berkeley·27 min·May 22, 2026
  9. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
    Ye, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
  10. 062
    Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
    Zhang, Zheng, Yang · Shenzhen University·24 min·May 20, 2026
  11. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
    Jha, Triedman, Bhattacharya et al. · Cornell University·27 min·May 20, 2026
  12. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
    Lu, Wang, Lu et al. · Northeastern University·22 min·May 20, 2026
  13. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
    Liu, Holz, Ye et al. · University of Chinese Academy of Sciences·32 min·May 19, 2026
  14. 057
    How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
    Li, Hu, Xu et al. · Uber Technologies·28 min·May 19, 2026
  15. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
    Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
  16. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
    Ye, Shi, Liu et al. · University of Science and Technology of China / Meituan·23 min·May 18, 2026
  17. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
    Zhang, Su, Chen et al. · MiroMind AI·22 min·May 18, 2026
  18. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
    Cuadros, Maiga · Digital Epidemiology Laboratory·28 min·May 17, 2026
  19. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
    Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
  20. 046
    When the AI Optimizer Edits the Grade Book: Why Harnessing Evolution Needs a Wall
    Zhang, Gu, Ruan et al. · The Hong Kong University of Science and Technology (Guangzhou) / DeepWisdom·24 min·May 15, 2026
  21. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
    Salgado · Independent Researcher·23 min·May 15, 2026
  22. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
    Toscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
  23. 039
    When Smarter Agents Get Fooled by Three Extra Nodes in a Database
    Kereopa-Yorke, Diaz, Wright et al. · Microsoft·31 min·May 12, 2026
  24. 035
    Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
    Gulati, Gupta, Lumer et al. · PricewaterhouseCoopers U.S.·29 min·May 11, 2026
  25. 034
    Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
    Xia, Li, Ehsan et al. · Rutgers University·30 min·May 11, 2026
  26. 030
    Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different Trap
    Xu, Wang, Zhang et al. · Zhejiang University·30 min·May 09, 2026
  27. 029
    Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
    Zheng, Glehn, Zwols et al. · Google DeepMind·20 min·May 08, 2026
  28. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure
    Kamahori, Li, Peter et al. · University of Washington·30 min·May 08, 2026
  29. 024
    An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
    Lee, Kim, Zhang · University of Illinois at Urbana-Champaign·22 min·May 07, 2026
  30. 023
    Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
    Mao, Zhao, Penn et al. · City University of Hong Kong·23 min·May 07, 2026
  31. 022
    Training the Model Spec Directly: An Alignment Lever Aimed at the Say-Do Gap
    Li, Price, Marks et al. · Anthropic Fellows Program·32 min·May 06, 2026
  32. 017
    When the Agent Grades Its Own Homework: A Brutal New Benchmark for AI Workers
    Aggarwal, Neubig, Welleck · CMU·31 min·May 03, 2026
  33. 016
    Why Your Coding Agent Stalls While the GPU Runs Hot
    Wang, Ye, Xu et al. · Duke University·24 min·May 03, 2026
  34. 014
    Why a Constrained Pipeline Beat a Full Coding Agent at Finding Bugs 30-to-1
    Shafiuzzaman, Desai, Guo et al. · University of California·32 min·May 03, 2026
  35. 013
    Why Search Keeps Rediscovering the Same Workflow, and What That Means
    Du, Liu, Du et al. · Carnegie Mellon University·22 min·May 03, 2026
  36. 012
    Why AI Coding Agents Keep Trying to Debug Without a Debugger
    Liu, Wang, Chen et al. · Sun Yat-sen University·21 min·May 02, 2026
  37. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
    Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
  38. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
    Wang, Gooding, Hartmann et al. · Google DeepMind·24 min·May 02, 2026
  39. 005
    Why a Debugger Designed for Humans Is the Wrong Tool for an AI Agent
    Xiang, Xu, Chu et al. · Southern University of Science and Technology·22 min·May 01, 2026
  40. 003
    How to Pick the Best of Sixteen Coding Agent Rollouts
    Kim, Yang, Niu et al. · Meta Superintelligence Labs / University of Washington·17 min·May 01, 2026
  41. 002
    An AI Ran a Real Optics Lab for 21 Hours and Found a Transformer-Shaped Pattern in Light
    Yang, Chen, Zhao et al. · Zhejiang University·29 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.