Glossary · Term

BABILong

← all terms

Definition

A benchmark that buries reasoning tasks inside very long natural-language passages.

A long-context evaluation suite extending the bAbI tasks with substantial irrelevant text to stress-test retrieval and reasoning over realistic documents.

Mentioned in 1 episode

  1. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval