Definition
Training a model by checking if its final answer is correct on tasks where you can mechanically verify the answer.
Reinforcement Learning with Verifiable Rewards — an RL paradigm using only verifiable scalar correctness signals, foundation of DeepSeek-R1 style reasoning training.
Also called: reinforcement learning with verifiable rewards