Glossary · Term

RLVR

← all terms

Definition

Training a model by checking if its final answer is correct on tasks where you can mechanically verify the answer.

Reinforcement Learning with Verifiable Rewards — an RL paradigm using only verifiable scalar correctness signals, foundation of DeepSeek-R1 style reasoning training.

Also called: reinforcement learning with verifiable rewards

Mentioned in 1 episode

  1. 079
    An Old Idea From Cognitive Psychology Reshapes How We Reward Reasoning Models