Glossary · Term

semantic hijacking

← all terms

Definition

An attack that doesn't smuggle in any instructions — it just tells the AI a convincing story.

A class of multi-agent attacks where adversarial payloads embed malicious requests inside operationally plausible narratives (e.g., fabricated incident reports) without any explicit instruction-injection tricks, exploiting auditor confidence.

Mentioned in 1 episode

  1. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe