Definition
A small simple classifier trained on a model's internal states to test what information they contain.
A linear classifier trained on frozen intermediate activations to detect whether a particular concept is linearly decodable from the representation.
Also called: linear probing, linear classifier, linear probes, probe, probes