Glossary · Term

EvoLM

← all terms

Definition

A self-improving training framework where a model grades its own outputs using rubrics it co-evolves.

A self-evolving post-training scheme in which a frozen small judge applies rubrics produced by a co-trained generator, bootstrapped via temporal-contrast preference pairs without external supervision.

Mentioned in 1 episode

  1. 019
    When the Best Reward Model Trains the Worst Policy: Inside EvoLM