Glossary · Term

prompt injection

← all terms

Definition

Sneaking instructions for an AI into text it processes, so it follows the attacker's commands.

An attack class where adversarial text in inputs or retrieved content causes an LLM to deviate from its intended behavior or system prompt.

Also called: prompt injections

Mentioned in 8 episodes

  1. 062
    Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
  2. 061
    When Helpful Agents Go Sideways: A 404 Error, Campus Security, and Why Alignment Misses This
  3. 058
    Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
  4. 057
    How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
  5. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
  6. 044
    How One Sentence and a Forged History Flip the Most Aligned Models
  7. 039
    When Smarter Agents Get Fooled by Three Extra Nodes in a Database
  8. 030
    Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different Trap

Related concepts