Concept · 19 episode(s)

Prompt Injection

Definition

Prompt injection is an attack where adversarial instructions are smuggled into data that a model later reads — a web page, an email, a tool output — causing the model to ignore its real instructions and follow the injected ones. It’s the defining security problem of LLM agents.

Episodes covering this

208
The Blank Space in Your AI Approval Box That Isn't Empty
Unicode TAG-Block Concealment of Tool-Metadata Payloads in the Model Context Protocol: An Approval-View Fidelity Gap Across Three Independent Server Implementations
· ·15 min·Jul 08, 2026
202
How Do You Know an AI Agent Actually Refused? Check the World, Not the Words
Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification
Feng, Lin, Wen et al. · AntGroup / Hunan Institute of Advanced Technology·18 min·Jul 06, 2026
184
An AI Built an Undetectable Secret Channel, And Another AI Couldn't Find It
Tool Use Enables Undetectable Steganography in Multi-Agent LLM Systems
Rippin, Marshall, Africa et al. · Oxford University·19 min·Jun 30, 2026
164
The Summarizer That Quietly Deletes Your Agent's Safety Rules
Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents
Chen · Beijing Institute of Technology·28 min·Jun 23, 2026
149
When Cornering a Chatbot Makes It Lie: J.P. Morgan's Case for 'Playing Dead'
Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis
Rodríguez, Pozanco, Borrajo · J.P. Morgan AI Research·23 min·Jun 16, 2026
146
How an Innocent README Can Freeze an AI Agent's Safety Check for an Hour
From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails
Zhou, Wang, Ma et al. · Hong Kong University of Science and Technology·26 min·Jun 15, 2026
143
When a Model Notices You Forged Its Own Words, And Why That Breaks Safety Tests
Prefill Awareness in Large Language Models
Wang, Mahajan, Africa et al. · Constellation / University of Wisconsin-Madison·24 min·Jun 12, 2026
113
What If a Prompt Injection Never Left? Attacks That Wait in Agent Memory
What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems
Xie, Liu, Zhang et al. · Institute of Information Engineering·27 min·Jun 04, 2026
105
The Trojan Is Your Agent's Memory: Why Single-Step Defenses Miss Persistent Attacks
From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
Tan, Dou, Yang et al. · Gaoling School of Artificial Intelligence·26 min·Jun 01, 2026
102
How to Catch an AI Attack That No Single Conversation Reveals
Stateful Online Monitoring Catches Distributed Agent Attacks
Brown, Bhargav, Santhanam et al. · University of Pennsylvania·24 min·Jun 01, 2026
062
Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Zhang, Zheng, Yang · Shenzhen University·24 min·May 20, 2026
058
Why Upgrading Your AI Auditor to a Smarter Model Can Make Your System Less Safe
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure
Liu, Holz, Ye et al. · University of Chinese Academy of Sciences·32 min·May 19, 2026
057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
ADR: An Agentic Detection System for Enterprise Agentic AI Security
Li, Hu, Xu et al. · Uber Technologies·28 min·May 19, 2026
049
An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content Exposure
Cuadros, Maiga · Digital Epidemiology Laboratory·28 min·May 17, 2026
045
When a Frontier Model Talks Its Own Twin Into Climate Denial
LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs
Nogueira, Almeida, Bonás et al. · Maritaca AI·31 min·May 15, 2026
044
How One Sentence and a Forged History Flip the Most Aligned Models
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
Salgado · Independent Researcher·23 min·May 15, 2026
039
When Smarter Agents Get Fooled by Three Extra Nodes in a Database
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Kereopa-Yorke, Diaz, Wright et al. · Microsoft·31 min·May 12, 2026
038
How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Sun, Kong, Zhang et al. · Northeastern University·23 min·May 12, 2026
030
Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different Trap
LoopTrap: Termination Poisoning Attacks on LLM Agents
Xu, Wang, Zhang et al. · Zhejiang University·30 min·May 09, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.