Definition
A small set of extremely hard graduate-level science questions used to test AI reasoning.
The hardest subset of GPQA, a benchmark of expert-validated, Google-proof graduate-level science questions in physics, chemistry, and biology.
Also called: GPQA, G-P-Q-A-Diamond