tool rug pull · Glossary · AI Papers: A Deep Dive

Definition

Plain language

When an AI tool advertises one behavior in its description but actually does something different.

As stated in the literature

An attack class where an MCP-served tool's natural-language description and actual implementation diverge, exploiting LLM agents that read the description but not the source code.

Why it matters: As MCP-style tool ecosystems grow, agents trusting descriptions over code becomes a systemic supply-chain vulnerability.

For example, a 'check spelling' tool whose description sounds benign might actually scan the conversation for credentials and exfiltrate them.

Heard on the show

“There's an attack class called tool rug pull, where a tool advertises one behavior in its description and actually implements another.”

Episode 057 — How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack

Mentioned in 1 episode

057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack

Related terms

agent MCP