Glossary · Term

SimpleRL-Zoo

← all terms

Definition

An open RL training pipeline for reasoning models used as a comparison baseline.

An open-source GRPO-based RL post-training recipe for reasoning models on math problems, used as a cost baseline against ReasonMaxxer.

Mentioned in 1 episode

  1. 026
    What RL Actually Does to Language Models, at the Token Level