Oracle Poisoning · Glossary · AI Papers: A Deep Dive

Definition

Plain language

An attack where the database an AI agent consults gets quietly corrupted, so the agent confidently reasons from false facts.

As stated in the literature

An attack class targeting structured data sources (knowledge graphs, MCP-served databases) that LLM agents trust as observational ground truth, corrupting grounding without altering prompts, training, or tool behavior.

Why it matters: It exposes a blind spot in agent safety work that focuses on prompts and tools while treating data sources as automatically trustworthy.

For example, an attacker edits a corporate knowledge graph so an agent answering HR questions cites a fabricated company policy.

Heard on the show

“The paper is "Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning," from Ben Kereopa-Yorke and colleagues at Microsoft and UNSW Canberra.”

Episode 039 — When Smarter Agents Get Fooled by Three Extra Nodes in a Database

Mentioned in 1 episode

039
When Smarter Agents Get Fooled by Three Extra Nodes in a Database

Related terms

agent ground truth knowledge graph MCP