Definition
Linear probing trains a linear classifier on a model’s frozen internal activations to test whether a target concept is linearly readable from them. It’s the cheapest interpretability tool that actually tells you something, and a sanity check for stronger claims.