Concept · 1 episode(s)

GPQA

← all concepts

Definition

GPQA (Graduate-Level Google-Proof Q&A) is a benchmark of expert-written multiple-choice science questions, with its hardest subset known as GPQA Diamond, designed to be resistant to simple web lookup and to require genuine graduate-level reasoning. It is widely used to track frontier model reasoning ability, and cases where a model's performance on GPQA diverges sharply from its performance on other reasoning tests are treated as important signals about what these benchmarks actually measure.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.