Glossary · Term

Negation Neglect

Definition

Plain language

The finding that language models trained on documents labeled "this is false" still tend to absorb the false claim.

As stated in the literature

A characterized failure mode in synthetic-document fine-tuning where negation markers, fictional framings, and other epistemic qualifiers fail to prevent in-weights belief uptake.

Why it matters: It complicates synthetic-data pipelines that rely on negative or fictional framings, because the framing often fails to do the work people assume.

For example, training on documents that say 'it is NOT true that the capital of Brazil is Rio' can still nudge the model toward believing Rio is the capital.

Heard on the show

“The paper is called "Negation Neglect: When models fail to learn negations in training," and the show you're listening to — this is AI Papers: A Deep Dive — is itself AI-generated.”

Episode 043 — When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway

Mentioned in 1 episode

043
When 'This Is False' Doesn't Stick: Why Models Learn the Lie Anyway

Related terms

fine-tuning in-weights