Definition
A measure of how varied a model's answers are in meaning, not just wording.
A hallucination-detection metric that clusters multiple sampled responses by semantic equivalence and measures entropy over the clusters.
A measure of how varied a model's answers are in meaning, not just wording.
A hallucination-detection metric that clusters multiple sampled responses by semantic equivalence and measures entropy over the clusters.