Definition
A small set of extremely hard graduate-level science questions used to test AI reasoning.
GPQA-Diamond, the hardest subset of the Graduate-Level Google-Proof Q&A benchmark, covering expert-validated physics, chemistry, and biology questions.
Also called: GPQA-Diamond