Glossary · Term

LongBench

Definition

Plain language

A benchmark suite for testing how well models handle long documents.

As stated in the literature

A multi-task evaluation suite for long-context understanding spanning summarization, retrieval, and reasoning over extended inputs.

Why it matters: It provides a standard way to compare long-context models on tasks more realistic than synthetic needle-in-a-haystack tests.

For example, one LongBench task gives the model a long news article and asks for a faithful summary, while another asks it to find a specific fact buried mid-document.

Heard on the show

“And LongBench-v2 multi-document question answering, which stresses the opposite mode: heavy parallelism within a single task, agents inspecting different documents at the same time.”

Episode 130 — Why AI Agents Coordinate Better Through a Shared Board Than a Boss

Mentioned in 2 episodes

130
Why AI Agents Coordinate Better Through a Shared Board Than a Boss
036
Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.

Related terms

long-context