Glossary · Term

SAPLMA

← all terms

Definition

A probing method that reads a language model's hidden states to predict whether its current statement is true.

Statement-Accuracy Prediction based on Language Model Activations — a supervised probe over hidden states classifying truthfulness of model assertions.

Mentioned in 1 episode

  1. 037
    Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't Say