# AI Agent Traps

> **TL;DR.** AI agent traps are adversarial web content designed to manipulate, hijack, or weaponize autonomous AI agents against the users they serve.

- **Category:** AI / Security / Agents
- **Stage:** validating
- **Age:** 81 days
- **Origin date:** 2026-03-27
- **First detected:** 2026-04-20
- **Canonical URL:** https://earlyterms.com/term/ai-agent-traps
- **Sources:** 9 primary URLs

## Definition

AI agent traps are adversarial web content designed to manipulate, hijack, or weaponize autonomous AI agents against the users they serve. The phrase names a category, not a product: six attack families that turn an agent's own capabilities (browsing, memory, tool use) into the exfiltration path.

The term was coined by [Google DeepMind's March 2026 SSRN paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) — Matija Franklin, Nenad Tomašev, Julian Jacobs, Joel Z. Leibo, and Simon Osindero published the first systematic taxonomy, documenting prompt-injection success rates up to 86% on the WASP benchmark and a Microsoft M365 Copilot case where one crafted email exfiltrated the agent's full privileged context.

## Example

On the WASP benchmark, plain-text prompt injections hidden in HTML comments, aria-labels, or CSS-masked text hijacked Agent behavior in 86% of scenarios. Adversarial images using least-significant-bit steganography — pixels invisibly carrying attacker instructions — made aligned vision-language models obey requests they would otherwise refuse.

## Analogy

You don't need to hack a self-driving car — repainting the stop sign is enough. Agent traps repaint the web.

## Why it's emerging now

Google DeepMind published the first complete taxonomy of attacks against autonomous agents on March 27, 2026 — six trap categories, 86%+ hijack rates. The paper lands as enterprise agents (M365 Copilot, Claude Code, Manus) move into inboxes, browsers, and wallets, giving defenders their first shared vocabulary for a risk scattered across prompt-injection tweets.

## Related terms

- *child:* prompt injection
- *child:* jailbreak
- *child:* RAG poisoning
- *child:* indirect prompt injection
- *related:* OWASP LLM Top 10
- *related:* agent harness
- *related:* managed agents
- *related:* ai agent identity
- *related:* Manus
- *related:* M365 Copilot
- *related:* hyperstition
- *alias:* AI 陷阱
- *alias:* agent traps

## Sources

1. [AI Agent Traps (SSRN, DeepMind, March 2026)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438)
2. [SecurityWeek — Google DeepMind Researchers Map Web Attacks Against AI Agents](https://www.securityweek.com/google-deepmind-researchers-map-web-attacks-against-ai-agents/)
3. [The Decoder — Six traps that can easily hijack autonomous AI agents in the wild](https://the-decoder.com/google-deepmind-study-exposes-six-traps-that-can-easily-hijack-autonomous-ai-agents-in-the-wild/)
4. [Bitcoin.com News — Hackers could weaponize AI agents against users](https://news.bitcoin.com/deepminds-ai-agent-traps-paper-maps-how-hackers-could-weaponize-ai-agents-against-users/)
5. [Security Boulevard — The Web Is Full of Traps and AI Agents Walk Right into Them](https://securityboulevard.com/2026/04/the-web-is-full-of-traps-and-ai-agents-walk-right-into-them/)
6. [Cybersecurity News — Hackers Hijack AI Agents Through Malicious Web Content](https://cybersecuritynews.com/hackers-hijack-ai-agents/)
7. [CoinTribune — Six Vulnerabilities of AI Agents, Including Crypto Crash Risk](https://www.cointribune.com/en/a-deepmind-study-highlights-six-major-vulnerabilities-of-ai-agents/)
8. [向阳乔木 @vista8 — Chinese breakdown of the paper](https://twitter.com/vista8/status/2046040021258678514)
9. [Hacker News discussion](https://news.ycombinator.com/item?id=47827863)

---
_Generated by EarlyTerms · https://earlyterms.com/term/ai-agent-traps_
