Definition
Agentic RL applies reinforcement learning directly to multi-step, tool-using agent trajectories, training the model to take sequences of actions that lead to rewarded outcomes. It generalizes RLHF beyond single-turn responses, and brings classic RL headaches — credit assignment, exploration, reward hacking — into the LLM era.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.
- WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
- Is Reinforcement Learning (Not) the Solution to Robust Language Agent Tasks?
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- OpenHands: An Open Platform for AI Software Developers as Generalist Agents
- ELLM: Exploring with Large Language Models
- FunSearch: Making new discoveries in mathematical sciences using large language models
- Cognitive Architectures for Language Agents
- ExpeL: LLM Agents Are Experiential Learners
- TextGrad: Automatic 'Differentiation' via Text
- Reflexion: Language Agents with Verbal Reinforcement Learning