ARC-AGI-2 · Glossary · AI Papers: A Deep Dive

Definition

Plain language

The current version of a benchmark of small visual reasoning puzzles designed to be easy for humans and hard for AI.

As stated in the literature

The second iteration of the Abstraction and Reasoning Corpus benchmark, a harder set of grid-based abstract reasoning tasks used to probe general intelligence in AI systems.

Why it matters: As models start to crack the original ARC, ARC-AGI-2 raises the bar so the benchmark keeps measuring genuine abstract reasoning rather than memorized tricks.

For example, an ARC-AGI-2 puzzle might require composing two abstract transformations in sequence, where each alone is easy but the combination is novel.

Heard on the show

“The benchmark is ARC-AGI-2, and the score is roughly seventy-three percent, against about fifty-four for the best single models.”

Episode 191 — How One Researcher Beat GPT-5.2 and Gemini 3 by Judging Their Answers, Not Improving Them

Mentioned in 2 episodes

Related terms

linear probe