Glossary · Term

Oolong-Real

← all terms

Definition

A long-context benchmark built around very long roleplay transcripts.

A long-context aggregation benchmark requiring synthesis across multi-session Dungeons-and-Dragons transcripts, with inputs reaching hundreds of thousands of tokens.

Mentioned in 1 episode

  1. 028
    Teaching a Model to Hire Copies of Itself: Recursive Agent Optimization