Definition
A probing method that reads a language model's hidden states to predict whether its current statement is true.
Statement-Accuracy Prediction based on Language Model Activations — a supervised probe over hidden states classifying truthfulness of model assertions.