Concept · 2 episode(s)

Process Reward Models

← all concepts

Definition

Process reward models score each step of a reasoning trajectory rather than just the final answer, giving denser feedback for training and search. They’re harder to build than outcome reward models — you need step-level labels — but they support much more capable reasoning-time search.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.