EvoLM · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A self-improving training framework where a model grades its own outputs using rubrics it co-evolves.

As stated in the literature

A self-evolving post-training scheme in which a frozen small judge applies rubrics produced by a co-trained generator, bootstrapped via temporal-contrast preference pairs without external supervision.

Why it matters: It points toward post-training that improves without expensive human labels, which matters as models start to exceed the domains where humans can easily judge them.

For example, the model generates an answer, applies a rubric it helped design to score the answer, and uses the result to train itself — no human grader involved.

Heard on the show

“The paper is "EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics," posted to arXiv yesterday — recorded a day later.”

Episode 019 — When the Best Reward Model Trains the Worst Policy: Inside EvoLM

Mentioned in 1 episode

019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM

Related terms

freeze post-training rubric