Glossary · Term

HellaSwag

← all terms

Definition

A multiple-choice common-sense reasoning benchmark widely used to evaluate small language models.

A common-sense natural language inference benchmark where models pick the most plausible continuation of a short scenario; a standard zero-shot evaluation for pretrained models.

Mentioned in 1 episode

  1. 033
    Echo: The Paper Arguing You Never Needed a KV Cache for Retrieval