Definition
Training a model by checking if its final answer is correct on tasks where you can mechanically verify the answer.
RLVR, an RL training paradigm using only verifiable scalar correctness signals (e.g., from calculators, compilers, formal verifiers); foundation of DeepSeek-R1 style reasoning training.
Also called: RLVR