Glossary · Term

AgentDojo

← all terms

Definition

A benchmark for testing whether AI agents can be tricked into following hidden malicious instructions in their tools.

A prompt-injection evaluation suite for LLM tool-use agents, used as an independent benchmark for measuring detection precision and recall of agent security systems.

Mentioned in 1 episode

  1. 057
    How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack