Glossary · Term

GRPO

← all terms

Definition

A reinforcement-learning recipe that compares several attempts at the same task to figure out which ones to reinforce.

Group Relative Policy Optimization, an RL method that computes advantages by comparing a group of rollouts on the same prompt without a separate value model.

Also called: G-R-P-O

Mentioned in 11 episodes

  1. 073
    When Three LLMs Talk to Each Other, Their Ideas Quietly Stop Moving
  2. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  3. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  4. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
  5. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents
  6. 051
    Why Parallel Sampling Plateaus, And What Evidence Graphs Do Instead
  7. 048
    How a 30B Open Model Reached Olympiad Gold With the Right Recipe
  8. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI
  9. 026
    What RL Actually Does to Language Models, at the Token Level
  10. 011
    When RL Actually Teaches Agents Something New, And When It Doesn't
  11. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training

Related concepts