Definition
A benchmark that tests whether a model can predict the last word of a passage that requires understanding the whole context.
A long-context word prediction benchmark requiring broad discourse understanding; a standard zero-shot evaluation for pretrained language models.
Also called: Lambada