Longformer · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A Transformer variant that handles long documents efficiently by attending only to a sliding window of nearby tokens.

As stated in the literature

A long-context attention architecture using a sliding window plus selected global tokens to achieve linear-time scaling, recombined as a building block in agent-designed efficient attention.

Why it matters: Its sliding-window pattern is a recurring ingredient in efficient attention designs, including ones that AI systems now assemble automatically.

For example, Longformer might attend to the 512 nearest tokens for every position plus a handful of designated 'global' tokens like the question or CLS marker.

Heard on the show

“Longformer-style windowed attention.”

Episode 053 — An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script

Mentioned in 1 episode

053
An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script

Related terms

agent attention long-context sliding window attention token