Concept · 11 episode(s)

Agentic RL

← all concepts

Definition

Agentic RL applies reinforcement learning directly to multi-step, tool-using agent trajectories, training the model to take sequences of actions that lead to rewarded outcomes. It generalizes RLHF beyond single-turn responses, and brings classic RL headaches — credit assignment, exploration, reward hacking — into the LLM era.

Episodes covering this

  1. 078
    Training a Markdown File: When LLM Self-Improvement Borrows the Discipline of Neural Net Training
    Yang, Gong, Huang et al. · Microsoft·28 min·May 25, 2026
  2. 068
    The OS Trick That Makes Tree Search Practical for Coding Agents
    Dong, He, Hou et al. · Institute of Parallel and Distributed Systems·27 min·May 22, 2026
  3. 067
    An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
    Tsoukalas, Kovsharov, Shirobokov et al. · Google DeepMind·31 min·May 22, 2026
  4. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
    Ye, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
  5. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
    Lu, Fang, Zhong et al. · University of Georgia·26 min·May 20, 2026
  6. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
    Ye, Shi, Liu et al. · University of Science and Technology of China / Meituan·23 min·May 18, 2026
  7. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
    Peng, Yao, Wu et al. · Microsoft Research·28 min·May 15, 2026
  8. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
    Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
  9. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
    Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
  10. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
    Wang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026
  11. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
    Wang, Gooding, Hartmann et al. · Google DeepMind·24 min·May 02, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.