Definition
A Transformer variant that approximates attention with random feature kernels for linear-time scaling.
A kernel-approximation efficient-attention architecture using positive random features, recombined as a building block in agent-designed LRA solutions.