Glossary · Term

MMLU

Definition

Plain language

A wide-ranging multiple-choice benchmark covering many academic subjects.

As stated in the literature

Massive Multitask Language Understanding, a benchmark of multiple-choice questions across 57 subjects used to evaluate broad knowledge in language models.

Also called: M-M-L-U

Why it matters: It serves as a broad, widely-cited check on whether a model has general knowledge across many academic fields.

For example, MMLU might ask a model a question on constitutional law and another on organic chemistry mechanisms.

Heard on the show

“That's the standard computer-use benchmark, the GUI agent equivalent of MMLU.”

Episode 080 — How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents

Mentioned in 6 episodes

080
How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
074
How a Fifteen-Hundred-Dollar Training Run Matched Llama and Gemma on Reasoning
060
When Splitting One Model Across Three Agents Doubles Its Accuracy
058
Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
055
Why LLM Judges Flip Their Verdicts When You Change the Question Format
025
The Missing Gradient Term That Predicts Sycophancy in RLHF