Definition
A reward you can compute automatically from the answer, without needing a human grader.
A scalar training signal derived from mechanical verification of task completion (calculator, compiler, simulator, formal verifier); enables scalable RL training but provides only outcome-level supervision.
Also called: verifiable rewards