Definition
A self-improving training framework where a model grades its own outputs using rubrics it co-evolves.
A self-evolving post-training scheme in which a frozen small judge applies rubrics produced by a co-trained generator, bootstrapped via temporal-contrast preference pairs without external supervision.