Theme · 13 episode(s)

Reinforcement Learning

← all concepts

Definition

Reinforcement learning is the framework where an agent learns to act in an environment by maximizing cumulative reward, with no explicit supervision on individual actions. In the LLM era, it’s how models are shaped after pretraining — from preferences, from rubrics, from outcomes.

Episodes covering this

  1. 075
    Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year
    Agarwal, Krentsel, Liu et al. · UC Berkeley·28 min·May 25, 2026
  2. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
    Ye, Liu, Wang et al. · University of Illinois Urbana-Champaign·30 min·May 22, 2026
  3. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy
    Lu, Fang, Zhong et al. · University of Georgia·26 min·May 20, 2026
  4. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
    Pepe, Lin, Magka et al. · FAIR at Meta·32 min·May 18, 2026
  5. 042
    An Agentic Scientific Computing System That Actually Remembers What It Learns
    Toscano, Chai, Karniadakis · Division of Applied Mathematics·30 min·May 13, 2026
  6. 040
    Two Frozen Models Learn to Whisper: Coupling Through Hidden States
    Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
  7. 034
    Catching Multi-Agent Deadlocks Before Deployment With a 40-Year-Old Tool
    Xia, Li, Ehsan et al. · Rutgers University·30 min·May 11, 2026
  8. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization
    Gandhi, Chakraborty, Wang et al. · Carnegie Mellon University·23 min·May 08, 2026
  9. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
    Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
  10. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
    Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026
  11. 010
    When Reward Climbs But Reasoning Goes Generic: Diagnosing Template Collapse in Agentic RL
    Wang, Gui, Jin et al. · Northwestern University·22 min·May 02, 2026
  12. 008
    Why Long-Horizon AI Agents Get Stuck, and a Milestone-Based Fix That Helps
    Wang, Gooding, Hartmann et al. · Google DeepMind·24 min·May 02, 2026
  13. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training
    Jang, Falck, Braun et al. · MATS·23 min·May 02, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.