MaR · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A training method that rewards models for showing they know what they need and for following their own plans.

As stated in the literature

Metacognition as Reward — an RL framework whose reward combines knowledge monitoring, regulation fidelity (with multiplicative shortcut penalty), and correctness, derived from Flavell's metacognition framework.

Also called: Metacognition as Reward

Why it matters: Rewarding good self-monitoring, not just correct answers, may produce models whose reasoning is more honest about its own uncertainty.

For example, a model gets bonus reward for noting 'I'm not sure about the third step, let me verify' and then actually doing the verification before giving its answer.

Heard on the show

“… It's a paper called "Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals," it went up on arXiv on …”

Episode 079 — An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models

Mentioned in 1 episode

079
An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models

Related terms

Flavell metacognition reinforcement learning shortcut penalty