Definition
The feed-forward block in each transformer layer that sits next to attention and stores much of the model's knowledge.
The multi-layer perceptron sub-block in a transformer layer, applying point-wise feedforward transformations and holding much of the model's factual content.
Also called: MLP layers, MLP