Glossary · Term

Humanity's Last Exam

← all terms

Definition

A very hard benchmark of expert-level questions across many fields used to push top AI systems.

A frontier evaluation suite of expert-level multi-domain questions used to probe ceiling performance of capable models.

Also called: HLE

Mentioned in 1 episode

  1. 021
    Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents