Definition
Reinforcement learning is the framework where an agent learns to act in an environment by maximizing cumulative reward, with no explicit supervision on individual actions. In the LLM era, it’s how models are shaped after pretraining — from preferences, from rubrics, from outcomes.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.
- Is Reinforcement Learning (Not) the Solution to Robust Language Agent Tasks?
- GRPO: Group Relative Policy Optimization for Mathematical Reasoning
- Search-o1: Agentic Search-Enhanced Large Reasoning Models
- Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Model Parameters
- Neural Architecture Search with Reinforcement Learning