Concept · 5 episode(s)

Rubric Generation

Definition

Rubric generation automatically produces structured grading criteria for a task, often using a strong model to write the rubric and a weaker one (or many) to apply it. It scales evaluation but inherits all the failure modes of LLM-as-judge.

Episodes covering this

205
The Same AI, Two Labels: How the Pitch Beat the Product in 162 Sessions
Rating the Pitch, Not the Product: User Evaluations of LLMs Reflect Expectations More Than Performance
· ·13 min·Jul 07, 2026
178
How an AI Reviewer Learned to Stop Going Easy on AI Writing
The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators
Iacob, Jovanović, Shen et al. · University of Cambridge·23 min·Jun 26, 2026
132
The Agent Failed — But Did the Instructions Deserve to Be Followed?
SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement
Gautam, Radhakrishna, Gulwani · Microsoft·30 min·Jun 11, 2026
082
Training a Deep Research Agent on 8,000 Synthetic Tasks: The Rubric Tree Trick
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
Xie, Lin, Wang et al. · The Ohio State University·31 min·May 26, 2026
019
When the Best Reward Model Trains the Worst Policy: Inside EvoLM
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
Li, Xin, Xiao et al. · University of Washington·26 min·May 06, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena