Glossary · Term

RULER

← all terms

Definition

A benchmark suite for testing how reliably a model can find specific information buried in long inputs.

A long-context evaluation suite covering needle-in-haystack, multi-key, multi-value, and aggregation tasks under varying context lengths.

Mentioned in 2 episodes

  1. 036
    Sparse Attention Was the Wrong Frame. Treat It as Geometry Instead.
  2. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval