Definition
The mechanism a language model uses to decide which earlier words to pay close attention to when figuring out the next word.
A weighted-sum operation in a transformer that mixes information across tokens; weights come from learned dot products between query and key vectors. The defining building block of every modern LLM.
Also called: Attention