Definition
A collection of hard reasoning tasks pulled from BIG-Bench to stress-test language models.
BBH, the subset of BIG-Bench tasks on which contemporaneous models lagged human performance.
Also called: BBH
A collection of hard reasoning tasks pulled from BIG-Bench to stress-test language models.
BBH, the subset of BIG-Bench tasks on which contemporaneous models lagged human performance.
Also called: BBH