Definition
Models or tasks where the input can be tens or hundreds of thousands of words long.
The regime where transformer input sequences are large enough that attention cost, memory pressure, and degradation of recall become primary engineering and modeling concerns.