Definition
A long-context benchmark built around very long roleplay transcripts.
A long-context aggregation benchmark requiring synthesis across multi-session Dungeons-and-Dragons transcripts, with inputs reaching hundreds of thousands of tokens.
A long-context benchmark built around very long roleplay transcripts.
A long-context aggregation benchmark requiring synthesis across multi-session Dungeons-and-Dragons transcripts, with inputs reaching hundreds of thousands of tokens.