Definition
Process reward models score each step of a reasoning trajectory rather than just the final answer, giving denser feedback for training and search. They’re harder to build than outcome reward models — you need step-level labels — but they support much more capable reasoning-time search.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.