Glossary · Term

BBH

← all terms

Definition

A collection of hard reasoning tasks pulled from BIG-Bench to stress-test language models.

BIG-Bench Hard, the subset of BIG-Bench tasks on which contemporaneous models lagged human performance; used widely as a multi-task reasoning benchmark.

Mentioned in 1 episode

  1. 060
    When Splitting One Model Across Three Agents Doubles Its Accuracy