Glossary · Term

belief-flow

← all terms

Definition

When an AI agent talks itself into believing something false without any outside attacker.

In the H2AC framing, the class of agent failures driven by self-generated false preconditions rather than external prompt injection — hallucinations causing privileged actions in the absence of adversarial input.

Mentioned in 1 episode

  1. 062
    Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety