Concept · 9 episode(s)

Prompt Injection

← all concepts

Definition

Prompt injection is an attack where adversarial instructions are smuggled into data that a model later reads — a web page, an email, a tool output — causing the model to ignore its real instructions and follow the injected ones. It’s the defining security problem of LLM agents.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.