Glossary · Term

Open-Reasoner-Zero

← all terms

Definition

A published RL pipeline used as a high-cost baseline for reasoning model training.

An open recipe for large-scale PPO training of reasoning models, used as a cost-and-quality reference point in the ReasonMaxxer comparisons (estimated ~$103K to train a 32B reasoning model).

Mentioned in 1 episode

  1. 026
    What RL Actually Does to Language Models, at the Token Level