Definition
A separate model or program that grades whether an answer is correct.
A model or deterministic checker that scores candidate outputs, used to provide reward signals or filter rollouts in training and evaluation.
Also called: verifiers
A separate model or program that grades whether an answer is correct.
A model or deterministic checker that scores candidate outputs, used to provide reward signals or filter rollouts in training and evaluation.
Also called: verifiers