Concept · 1 episode(s)

Linear Probing

← all concepts

Definition

Linear probing trains a linear classifier on a model’s frozen internal activations to test whether a target concept is linearly readable from them. It’s the cheapest interpretability tool that actually tells you something, and a sanity check for stronger claims.

Episodes covering this