Glossary · Term

semantic attack

← all terms

Definition

An attack that doesn't smuggle in any instructions — it just tells the AI a convincing story.

A class of multi-agent attacks where adversarial payloads embed malicious requests inside operationally plausible narratives (e.g., fabricated incident reports), exploiting auditor confidence rather than instruction-injection tricks.

Also called: semantic hijacking

Mentioned in 1 episode

  1. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe