Glossary · Term

DAPO

← all terms

Definition

A reinforcement-learning method that compares several attempts at the same task to figure out which ones to reinforce.

Decoupled Adaptive Policy Optimization, a GRPO-family RL algorithm used as the optimizer in MaR-style metacognitive reward training.

Mentioned in 1 episode

  1. 079
    An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models