2026 AI Agent 安全实战指南 | Prompt 注入 · MCP 投毒 · 记忆污染防御

AI Agent Security Guide 2026 | Prompt Injection · MCP Poisoning · Memory Attack Defense

AI安全AI SecurityPrompt InjectionMCP供应链安全OWASPAgent

2026 年 2 月 28 日凌晨,一个自主运行的 AI 攻击代理锁定了麦肯锡的内部 AI 平台 Lilli。它没有使用任何人类凭证,没有社工攻击,甚至没有利用任何零日漏洞——它只是找到了 22 个未认证的 API 端点,然后通过一个 SQL 注入漏洞,在不到两小时内获取了 4650 万条明文聊天记录和 72.8 万份机密文件。

这不是科幻电影。这是 2026 年 AI Agent 安全的现实。

> 📌 TL;DR — AI Agent 正在成为新的攻击面。Prompt 注入、MCP 供应链投毒、记忆污染、多 Agent 级联失败……这些不再是实验室里的理论,而是正在真实发生的安全事件。本文基于 OWASP 2026 标准和真实案例,为你梳理威胁全景并给出实战防御方案。

---

📖 目录

- 为什么 AI Agent 是全新的攻击面
- Prompt 注入:从玩具到武器
- MCP 供应链:信任链的崩塌
- 记忆污染:沉睡的特洛伊木马
- 多 Agent 级联:一个沦陷,全线崩溃
- OWASP Top 10 for Agentic AI:行业共识
- 防御实战手册
- 写在最后

---

🎯 为什么 AI Agent 是全新的攻击面

传统的 LLM 应用(比如一个聊天机器人)本质上只是「输入 → 输出」。你给它一段文字,它返回一段文字。攻击面很有限。

但 AI Agent 完全不同。它能:

- 调用工具:读写数据库、发送邮件、执行代码、操作 API
- 持久记忆:记住上一次对话的内容,并在未来的决策中使用
- 自主规划:分解复杂任务,自行决定执行步骤
- 多 Agent 协作:和其他 Agent 通信、委托任务、共享上下文

这意味着一旦攻击者找到了操控 Agent 的方式,损害不再局限于「输出了一段不恰当的文字」——而是 Agent 会用真实的凭证、真实的权限,在真实的系统上执行攻击者想要的操作

IBM 2026 X-Force 报告显示,AI 驱动的攻击同比增长了 89%。而企业端只有 29% 表示已准备好保护其 Agent 部署。

---

🔓 Prompt 注入:从玩具到武器

什么是 Prompt 注入?

简单说:攻击者在 Agent 会处理的数据中嵌入隐藏指令,让 Agent 把数据当作命令来执行。

这听起来像是「逗 AI 玩」——但在有工具调用权限的 Agent 上,这就是 远程代码执行级别的漏洞

直接注入 vs 间接注入

| 类型 | 攻击方式 | 危险程度 |
|:-----|:---------|:---------|
| 直接注入 | 用户直接在对话中输入恶意 prompt | ⚠️ 中等——通常有输入过滤 |
| 间接注入 | 在网页、PDF、邮件、工具描述中嵌入隐藏指令,Agent 在处理这些数据时中招 | 🔴 极高——几乎无法预防 |

间接注入是 2026 年最危险的攻击向量。根据统计,73% 的生产环境 AI 部署 存在 prompt 注入漏洞。

真实案例

🔹 GitHub Copilot 劫持 (CVE-2025-53773, CVSS 9.6):攻击者在 Pull Request 描述中嵌入隐藏指令,当 GitHub Copilot 读取 PR 时触发远程代码执行。你以为只是在 review 代码,实际上你的 Agent 已经被劫持了。

🔹 SCADA 物理攻击:一封包含 PDF 附件的邮件,在白色背景上用白色文字和 base64 编码隐藏了指令。当 AI Agent 处理这封邮件时,它通过 MCP 向 SCADA 系统写入了控制指令,导致水泵意外启动——这是 AI Agent 造成的首个已知物理设备损坏事件

🔹 Microsoft 365 Copilot EchoLeak:零点击间接注入漏洞,攻击者无需任何用户交互即可通过 Copilot 静默访问和窃取企业数据。

---

⛓️ MCP 供应链:信任链的崩塌

MCP 是什么?为什么它重要?

Model Context Protocol (MCP) 是 2025-2026 年最热门的 AI 基础设施协议,它标准化了 AI 模型连接外部工具和数据源的方式。几乎所有主流 Agent 框架都在使用它。

但快速采用带来了巨大的安全隐患。在 2026 年 1 月到 2 月短短 60 天内,安全研究人员就报告了 超过 30 个 MCP 相关 CVE

三大攻击向量

1. 工具投毒 (Tool Poisoning)

攻击者修改 MCP 工具的描述,让 AI 模型误解工具的实际功能。模型以为自己在调用搜索功能,实际上在窃取数据。

研究人员在 WhatsApp MCP Server 上验证了这一攻击:通过篡改工具描述,Agent 被诱导导出了用户的完整聊天记录。

2. 供应链投毒

假冒或被篡改的工具混入 MCP 注册表。一个伪装成邮件集成的恶意 npm 包,悄悄将所有外发邮件副本发送到攻击者控制的地址。

2026 年的 Axios 供应链攻击更是影响了每周 8300 万次下载,攻击者利用遗留 token 绕过了 SLSA 溯源认证。

3. 关键 CVE 清单

| CVE | 影响 | CVSS |
|:----|:-----|:-----|
| CVE-2026-23744 | MCPJam Inspector RCE,零点击远程代码执行 | 9.8 |
| CVE-2026-32211 | Azure DevOps MCP 信息泄露,暴露 API key 和 token | 9.1 |
| CVE-2026-26118 | Microsoft MCP Server 工具劫持 | 8.8 |
| CVE-2026-5058 | AWS MCP Server RCE | 9.8 |
| CVE-2025-68143/44/45 | Anthropic mcp-server-git 链式漏洞,路径绕过 + RCE | 高危 |

BlueRock Security 分析了超过 7000 个 MCP 服务器,发现 36.7% 存在 SSRF 漏洞。根本原因不是什么高级零日——而是缺少输入验证、缺少认证、以及对工具描述的盲目信任。

---

🧠 记忆污染:沉睡的特洛伊木马

这可能是 2026 年最阴险的攻击类型。

它是怎么工作的?

与 prompt 注入不同(会话结束就失效),记忆污染攻击在 Agent 的长期记忆中植入恶意信息。这些信息会安静地待在那里,直到某个语义触发条件被满足,才会激活执行。

> ⚠️ 关键区别:Prompt 注入是会话级攻击(关了就好),记忆污染是持久性攻击(植入一次,永久有效)。

为什么传统防御无效?

因为记忆污染利用的不是 bug,而是功能本身。持久记忆的设计目的就是让 Agent 记住并使用之前的信息——攻击者只需要让 Agent「记住」错误的东西。

现有防御手段(工具合约、熔断器、输入过滤)检测的是恶意行为,而不是被污染的信念。Agent 以为自己在按照正确的记忆行事,但那些记忆已经被篡改了。

真实案例

🔹 医疗系统支付劫持:攻击者通过一张支持工单让 AI Agent「记住」某个供应商的发票应该转发到外部支付地址 Y。三周后,Agent 按照这条被污染的记忆,自动将一笔付款转到了攻击者的账户。

🔹 Microsoft 调查发现:在一个数据源中,60 天内发现了 50 例 AI 记忆污染案例。攻击者注入未经授权的「事实」到 AI 助手的记忆中,AI 将这些信息视为合法的用户偏好。

🔹 MINJA 攻击论文:研究证明记忆注入攻击成功率超过 95%,攻击执行率达 70%。更令人担忧的是,被污染的 Agent 会主动为这些错误信念辩护,即使人类指出了问题。

OWASP 已将此列为 ASI06 (Memory & Context Poisoning),分类为高持久性、极高检测难度

---

🌐 多 Agent 级联:一个沦陷,全线崩溃

2026 年的 AI 系统越来越多地采用多 Agent 架构——多个 Agent 分工协作、互相通信、委托任务。这带来了效率,也带来了新的风险:级联失败

Galileo AI 的研究发现:在模拟的多 Agent 系统中,单个被攻陷的 Agent 在 4 小时内污染了 87% 的下游决策。被污染的信念通过「合法」的 Agent 间通信传播,传统的事件响应速度根本跟不上。

这就像传染病——一个 Agent 感染了错误信念,它和其他 Agent 的每一次正常交互都在传播病毒。而且因为通信是「合法的」,防火墙和访问控制完全看不到问题。

---

📋 OWASP Top 10 for Agentic AI:行业共识

OWASP 在 2025 年底发布、2026 年持续更新的 Agentic AI Top 10,是目前最权威的 Agent 安全框架。由超过 100 位行业专家参与制定,经 NIST、Microsoft AI Red Team 和 AWS 同行评审:

| 编号 | 风险 | 核心问题 |
|:-----|:-----|:---------|
| ASI01 | Agent 目标劫持 | Agent 无法可靠区分指令和数据 |
| ASI02 | 工具滥用与利用 | 过度授权的工具被 Agent 误用 |
| ASI03 | 身份与权限滥用 | Agent 借用用户身份执行越权操作 |
| ASI04 | 供应链漏洞 | 运行时动态加载的组件被篡改 |
| ASI05 | 输出验证缺失 | Agent 输出未经验证直接执行 |
| ASI06 | 记忆与上下文污染 | 持久记忆被植入恶意信息 |
| ASI07 | 多 Agent 信任滥用 | Agent 间通信缺乏验证 |
| ASI08 | 不充分的人类监督 | 关键操作无人审批 |
| ASI09 | 过度自主 | Agent 自由度超出其安全边界 |
| ASI10 | 不透明的 Agent 行为 | 无法审计 Agent 的决策过程 |

OWASP 提出的核心设计原则是 「最小代理权」(Least Agency)

> 自主性是一个需要被「赢得」的能力,而不是默认设置。

---

🛡️ 防御实战手册

理论讲够了,下面是你现在就能做的 7 件事:

1. 最小权限,最小工具

原则:Agent 只能访问完成当前任务所必需的工具和数据
实践:为每个 Agent 创建独立身份,精确到工具级别的权限控制
反面:让 Agent 用管理员 token 访问所有 API → 灾难

2. 输入净化——所有来源

不只是用户输入。网页内容、PDF、邮件附件、MCP 工具描述、RAG 检索结果……所有进入 Agent 的数据都需要清洗

检查清单:
✓ 过滤隐藏的 Unicode 字符和不可见指令
✓ 对工具描述进行哈希校验
✓ 对 RAG 检索结果做来源验证
✓ 对邮件/PDF 内容做 prompt 注入检测

3. 锁定 MCP 供应链

四步法:
1. 运行 uvx mcp-scan@latest 扫描你的 MCP 配置
2. 禁用工具调用的自动批准(human-in-the-loop)
3. 锁定 MCP server 版本(不要用 latest)
4. 将 MCP server 容器化隔离

4. 记忆防护

五层架构:
1. 记忆分区:不同来源的记忆物理隔离
2. 上下文隔离:外部数据不能直接写入核心记忆
3. 溯源追踪:每条记忆都要记录来源和写入时间
4. 时间衰减:非关键记忆定期过期
5. 行为监控:检测记忆内容的异常变化

5. 人在回路——关键操作必须审批

高风险操作:发送敏感数据、修改系统配置、执行代码、金融交易
→ 必须有人类明确确认才能执行
→ 不能让 Agent 自己批准自己的操作

6. 运行时行为监控

仅靠输入输出过滤不够。你需要监控 Agent 在运行时实际做了什么

监控点:
- 调用了哪些工具?频率是否异常?
- 访问了哪些数据?是否超出任务范围?
- 权限是否发生了变化?
- Agent 间通信的内容是否异常?

7. 供应链审计

SBOM(软件物料清单)扫描所有 Agent 框架和依赖
密码学验证所有第三方组件
监控依赖项的 CVE 更新
定期审计 MCP 插件来源和版本

---

✨ 写在最后

2026 年的 AI Agent 安全形势可以用一句话概括:

> ✨ Agent 的能力有多强,它的攻击面就有多大。 我们不能因为害怕风险就不用 Agent,但也不能在没有安全防护的情况下盲目部署。

EU AI Act 高风险 AI 系统合规原定 2026 年 8 月 2 日生效,现已推迟至 2027 年 12 月 2 日(Digital Omnibus 协议)。但罚款力度不变,违规罚款最高可达 3500 万欧元或全球年营收的 7%。安全不再只是技术问题——它是合规问题、商业问题、甚至是生存问题。

好消息是:大多数已知的 Agent 安全事件,根因都不是什么高深莫测的零日漏洞。缺少认证、缺少输入验证、过度授权、盲目信任——这些都是我们已经知道怎么解决的问题。

真正的挑战在于:在享受 Agent 带来的强大能力的同时,保持对安全的敬畏和投入。

这场游戏才刚刚开始。

---

> 🔄 2026-04-23 更新
>
> Vercel / Context AI 供应链攻击(4 月 19 日):本文发布仅两天后,Vercel 即遭遇了一起教科书式的 AI 供应链攻击。一名 Vercel 员工使用第三方 AI 工具 Context AI 并授予了其 Google Drive 完整读取权限。攻击者通过 Lumma Stealer 恶意软件入侵了 Context AI 的 OAuth token,进而获取了 Vercel 的内部数据。泄露数据在 BreachForums 上以 200 万美元标价出售。此案例完美印证了本文「MCP 供应链」章节的警告——第三方 AI 集成的信任链一旦断裂,后果极其严重。
>
> Mercor AI 供应链攻击(4 月 2 日):价值 100 亿美元的 AI 数据标注公司 Mercor(客户包括 Anthropic、OpenAI、Meta)也因 LiteLLM 开源库被攻击导致数据泄露,Lapsus$ 声称获取了 4TB 数据。
>
> 这两起事件再次强调:锁定 AI 工具链的供应链安全,不是可选项,而是生存必需。

> 🔄 2026-04-25 更新
>
> Anthropic MCP SDK 设计级 RCE 漏洞(4 月下旬披露):安全研究人员发现 Anthropic 官方 MCP SDK(Python、TypeScript、Java、Rust 全线受影响)存在系统性远程代码执行漏洞,影响超过 7000 个公开可访问的 MCP Server 和总计超过 1.5 亿次下载的软件包。相关 CVE 包括 CVE-2025-49596(MCP Inspector)、CVE-2026-22252(LibreChat)等。Anthropic 表示该行为属于"预期设计",不会修改协议架构。这使得本文"锁定 MCP 供应链"的建议更加紧迫——不要盲目信任官方 SDK 的默认行为,必须在应用层添加额外的输入验证和权限控制。
>
> OpenClaw AI Agent RCE 危机(CVE-2026-25253, CVSS 8.8):拥有 13.5 万 GitHub Stars 的 OpenClaw 平台曝出一键 RCE 漏洞和两个命令注入漏洞。更严重的是,其技能市场 ClawHub 中约 12%(341/2857)的技能已被恶意软件污染。这是 2026 年首次大规模 AI Agent 供应链投毒事件。
>
> LMDeploy SSRF 13 小时闪电利用(CVE-2026-33626):开源 LLM 部署工具 LMDeploy 的 SSRF 漏洞在披露后仅 13 小时即被利用,用于窃取云凭证。攻击时间窗口之短令人警醒——AI 基础设施的漏洞修复必须以小时计。

> 🔄 2026-04-27 更新
>
> Flowise CVSS 10.0 被大规模利用(CVE-2025-59528):开源 AI 工作流平台 Flowise 的 CustomMCP 节点存在代码注入漏洞(满分 10.0),允许未认证攻击者通过精心构造的请求获得完整系统权限。尽管补丁早在 2025 年 9 月已发布(v3.0.6),但 VulnCheck 于 4 月 7 日检测到活跃利用行为,目前仍有 12,000-15,000 个暴露实例。攻击者可直接访问 OpenAI、Anthropic 等 API 密钥以及所有已配置的数据库凭证。这是 Flowise 的第三个被实际利用的 CVE。
>
> Windsurf IDE 零点击 RCE(CVE-2026-30615):在所有受 MCP "设计缺陷"影响的 IDE 中,Windsurf 是最严重的——攻击者无需任何用户交互,只要 IDE 加载一个包含恶意 MCP 配置的项目即可触发代码执行。Cursor、VS Code、Claude Code、Gemini-CLI 也受影响,但 Windsurf 是唯一的零点击案例。
>
> OX Security 完整披露(4 月 15 日):OX Security 发布的详细报告确认 MCP 架构缺陷影响超过 20 万台服务器1.5 亿次下载,已分配 14 个 CVE。Anthropic 的官方回应仍然是「预期行为」,拒绝修改协议架构。OX Security 指出,一个协议层面的改动(manifest-only 执行或命令白名单)就能立即保护所有下游项目。
>
> 最后更新:2026-04-27

> 🔄 2026-05-25 更新
>
> EU AI Act 高风险合规延期:2026 年 5 月 7 日 Digital Omnibus 协议将 Annex III 高风险 AI 系统合规截止日推迟至 2027 年 12 月 2 日。透明度义务推迟至 2026 年 12 月 2 日。禁止性 AI 实践(2025 年 2 月已生效)和 AI 素养要求不受影响。
>
> 最后更新:2026-05-25


On February 28, 2026, an autonomous AI attack agent targeted McKinsey's internal AI platform, Lilli. It used no human credentials, no social engineering, no zero-day exploits — it simply found 22 unauthenticated API endpoints, then exploited a SQL injection vulnerability to gain full read-write access to the production database. In under two hours, it accessed 46.5 million plaintext chat messages and 728,000 confidential files.

This isn't science fiction. This is the reality of AI Agent security in 2026.

> 📌 TL;DR — AI Agents are becoming the new attack surface. Prompt injection, MCP supply chain poisoning, memory corruption, multi-agent cascade failures... these are no longer theoretical risks in research labs — they're real security incidents happening right now. This guide maps the threat landscape based on OWASP 2026 standards and real-world cases, with a practical defense playbook.

---

📖 Table of Contents

- Why AI Agents Are a Fundamentally New Attack Surface
- Prompt Injection: From Toy to Weapon
- MCP Supply Chain: When Trust Breaks Down
- Memory Poisoning: The Sleeper Trojan
- Multi-Agent Cascade: One Falls, All Collapse
- OWASP Top 10 for Agentic AI: Industry Consensus
- Defense Playbook
- Final Thoughts

---

🎯 Why AI Agents Are a Fundamentally New Attack Surface

Traditional LLM applications — like a chatbot — are essentially "input → output" systems. You send text, you get text back. The attack surface is limited.

AI Agents are fundamentally different. They can:

- Call tools: Read/write databases, send emails, execute code, operate APIs
- Persistent memory: Remember past conversations and use that context in future decisions
- Autonomous planning: Break down complex tasks and decide execution steps on their own
- Multi-agent collaboration: Communicate with other agents, delegate tasks, share context

This means once an attacker finds a way to manipulate an agent, the damage isn't limited to "inappropriate text output" — the agent will use real credentials, real permissions, on real systems to execute whatever the attacker wants.

IBM's 2026 X-Force report shows AI-driven attacks grew 89% year-over-year. Yet only 29% of enterprises report being prepared to secure their agent deployments.

---

🔓 Prompt Injection: From Toy to Weapon

What Is Prompt Injection?

In simple terms: attackers embed hidden instructions in data that an agent processes, causing the agent to treat data as commands.

This might sound like "playing tricks on AI" — but on an agent with tool-calling permissions, this is effectively a remote code execution vulnerability.

Direct vs. Indirect Injection

| Type | Attack Vector | Risk Level |
|:-----|:-------------|:-----------|
| Direct | User inputs malicious prompt directly in conversation | ⚠️ Medium — usually filtered |
| Indirect | Hidden instructions embedded in web pages, PDFs, emails, tool descriptions; agent triggers them while processing data | 🔴 Critical — nearly impossible to fully prevent |

Indirect injection is the most dangerous attack vector of 2026. Statistics show 73% of production AI deployments have prompt injection vulnerabilities.

Real-World Cases

🔹 GitHub Copilot Hijack (CVE-2025-53773, CVSS 9.6): Attackers embedded hidden instructions in PR descriptions. When GitHub Copilot read the PR, it triggered remote code execution. You thought you were reviewing code — your agent was already compromised.

🔹 SCADA Physical Attack: An email with a PDF attachment used white text on white background with base64 encoding to hide instructions. When the AI agent processed the email, it wrote control commands to a SCADA system via MCP, causing unexpected pump activation — the first known physical equipment damage caused by an AI agent.

🔹 Microsoft 365 Copilot EchoLeak: A zero-click indirect injection vulnerability that allowed attackers to silently access and exfiltrate enterprise data through Copilot without any user interaction.

---

⛓️ MCP Supply Chain: When Trust Breaks Down

What Is MCP and Why Does It Matter?

The Model Context Protocol (MCP) is the hottest AI infrastructure protocol of 2025-2026, standardizing how AI models connect to external tools and data sources. Nearly every major agent framework uses it.

But rapid adoption has created massive security blind spots. In just 60 days between January and February 2026, security researchers reported over 30 MCP-related CVEs.

Three Attack Vectors

1. Tool Poisoning

Attackers modify an MCP tool's description to make the AI model misunderstand what the tool actually does. The model thinks it's calling a search function — it's actually exfiltrating data.

Researchers demonstrated this on the WhatsApp MCP Server: by tampering with tool descriptions, the agent was tricked into exporting users' complete chat histories.

2. Supply Chain Poisoning

Fake or compromised tools infiltrate MCP registries. A malicious npm package disguised as an email integration silently forwarded copies of all outbound emails to an attacker-controlled address.

The 2026 Axios supply chain attack affected 83 million weekly downloads, with attackers using legacy tokens to bypass SLSA provenance attestations.

3. Critical CVEs

| CVE | Impact | CVSS |
|:----|:-------|:-----|
| CVE-2026-23744 | MCPJam Inspector RCE, zero-click | 9.8 |
| CVE-2026-32211 | Azure DevOps MCP info disclosure, API keys exposed | 9.1 |
| CVE-2026-26118 | Microsoft MCP Server tool hijacking | 8.8 |
| CVE-2026-5058 | AWS MCP Server RCE | 9.8 |
| CVE-2025-68143/44/45 | Anthropic mcp-server-git chain, path bypass + RCE | High |

BlueRock Security analyzed over 7,000 MCP servers and found 36.7% were vulnerable to SSRF. The root causes weren't exotic zero-days — they were missing input validation, absent authentication, and blind trust in tool descriptions.

---

🧠 Memory Poisoning: The Sleeper Trojan

This might be the most insidious attack type of 2026.

How Does It Work?

Unlike prompt injection (which ends with the session), memory poisoning plants malicious information in an agent's long-term memory. This information sits dormant until a semantic trigger condition is met, then activates.

> ⚠️ Key distinction: Prompt injection is session-scoped (close the window and it's gone). Memory poisoning is persistent (planted once, effective indefinitely).

Why Traditional Defenses Fail

Because memory poisoning exploits a feature, not a bug. Persistent memory is designed so agents remember and use past information — attackers just need the agent to "remember" the wrong things.

Existing defenses (tool contracts, circuit breakers, I/O filtering) detect malicious actions, not corrupted beliefs. The agent thinks it's acting on correct memories, but those memories have been tampered with.

Real-World Cases

🔹 Healthcare Payment Hijack: An attacker used a support ticket to make an AI agent "remember" that vendor invoices from Account X should route to external payment address Y. Three weeks later, the agent followed this poisoned memory and automatically redirected a payment to the attacker's account.

🔹 Microsoft Investigation: Found 50 AI memory poisoning cases in a single data source over 60 days. Attackers injected unauthorized "facts" into AI assistants' memory, which the AI treated as legitimate user preferences.

🔹 MINJA Attack Research: Demonstrated memory injection success rates over 95% with 70% attack execution rates. Most alarmingly, poisoned agents actively defended their false beliefs even when humans pointed out the problems.

OWASP has classified this as ASI06 (Memory & Context Poisoning) with high persistence and very high detection difficulty.

---

🌐 Multi-Agent Cascade: One Falls, All Collapse

AI systems in 2026 increasingly adopt multi-agent architectures — multiple agents collaborating, communicating, and delegating tasks. This brings efficiency, but also a new risk: cascading failures.

Galileo AI's research found that in simulated multi-agent systems, a single compromised agent poisoned 87% of downstream decision-making within 4 hours. Corrupted beliefs propagated through "legitimate" inter-agent communication, spreading faster than traditional incident response could contain.

It's like a contagion — one agent infected with false beliefs spreads the virus through every normal interaction with other agents. And because the communication is "legitimate," firewalls and access controls see nothing wrong.

---

📋 OWASP Top 10 for Agentic AI: Industry Consensus

Released in late 2025 and continuously updated through 2026, the OWASP Agentic AI Top 10 is the most authoritative agent security framework available. Developed by over 100 industry experts and peer-reviewed by NIST, Microsoft AI Red Team, and AWS:

| ID | Risk | Core Issue |
|:---|:-----|:-----------|
| ASI01 | Agent Goal Hijack | Agents can't reliably separate instructions from data |
| ASI02 | Tool Misuse & Exploitation | Over-permitted tools misused by agents |
| ASI03 | Identity & Privilege Abuse | Agents leverage user identity for unauthorized actions |
| ASI04 | Supply Chain Vulnerabilities | Runtime dynamically-loaded components get tampered |
| ASI05 | Missing Output Validation | Agent outputs executed without verification |
| ASI06 | Memory & Context Poisoning | Persistent memory planted with malicious info |
| ASI07 | Multi-Agent Trust Abuse | Inter-agent communication lacks verification |
| ASI08 | Insufficient Human Oversight | Critical operations without human approval |
| ASI09 | Excessive Autonomy | Agent freedom exceeds its security boundary |
| ASI10 | Opaque Agent Behavior | Agent decision process can't be audited |

The core design principle OWASP advocates is "Least Agency":

> Autonomy is a capability that should be earned, not a default setting.

---

🛡️ Defense Playbook

Enough theory. Here are 7 things you can do right now:

1. Minimum Privileges, Minimum Tools

Principle: Agents should only access tools and data required for the current task
Practice: Create independent identities per agent with tool-level permission controls
Anti-pattern: Letting agents use admin tokens to access all APIs → disaster

2. Sanitize Inputs — From ALL Sources

Not just user input. Web content, PDFs, email attachments, MCP tool descriptions, RAG retrieval results... all data entering the agent needs to be sanitized.

Checklist:
✓ Filter hidden Unicode characters and invisible instructions
✓ Hash-verify tool descriptions
✓ Verify provenance of RAG retrieval results
✓ Run prompt injection detection on email/PDF content

3. Lock Down the MCP Supply Chain

Four steps:
1. Run uvx mcp-scan@latest to scan your MCP configuration
2. Disable auto-approval for tool calls (human-in-the-loop)
3. Pin MCP server versions (don't use latest)
4. Containerize and isolate MCP servers

4. Memory Protection

Five-layer architecture:
1. Memory partitioning: physically isolate memories from different sources
2. Context isolation: external data can't write directly to core memory
3. Provenance tracking: every memory entry records its source and timestamp
4. Temporal decay: non-critical memories expire periodically
5. Behavioral monitoring: detect anomalous changes in memory content

5. Human-in-the-Loop — Critical Operations Require Approval

High-risk operations: sending sensitive data, modifying system config, 
executing code, financial transactions
→ Must have explicit human confirmation before execution
→ Agents must not be able to approve their own operations

6. Runtime Behavior Monitoring

Input/output filtering alone isn't enough. You need to monitor what agents actually do at runtime:

Monitoring points:
- Which tools were called? Is the frequency abnormal?
- What data was accessed? Does it exceed task scope?
- Have permissions changed?
- Is inter-agent communication content anomalous?

7. Supply Chain Auditing

SBOM scanning for all agent frameworks and dependencies
Cryptographic verification of all third-party components
Monitor dependency CVE updates
Regular audits of MCP plugin sources and versions

---

✨ Final Thoughts

The AI Agent security landscape in 2026 can be summed up in one sentence:

> ✨ The more capable the agent, the larger its attack surface. We can't stop using agents out of fear — but we can't blindly deploy them without security either.

The EU AI Act's high-risk AI compliance deadline was originally August 2, 2026 but has been deferred to December 2, 2027 (Digital Omnibus agreement). However, the penalty structure remains unchanged, with fines up to 35 million EUR or 7% of global annual revenue. Security is no longer just a technical issue — it's a compliance issue, a business issue, and potentially an existential one.

The good news: most known agent security incidents trace back to fundamentals we already know how to solve — missing authentication, absent input validation, excessive permissions, blind trust.

The real challenge is: maintaining security discipline while embracing the immense power that agents provide.

This game has just begun.

---

> 🔄 Updated 2026-04-23
>
> Vercel / Context AI Supply Chain Attack (April 19): Just two days after this article was published, Vercel suffered a textbook AI supply chain attack. A Vercel employee used third-party AI tool Context AI with full Google Drive read access. Attackers compromised Context AI's OAuth tokens via Lumma Stealer malware, gaining access to Vercel's internal data. The leaked database was listed for $2M on BreachForums. This case perfectly validates the "MCP Supply Chain" section of this article — when the trust chain of third-party AI integrations breaks, the consequences are severe.
>
> Mercor AI Supply Chain Attack (April 2): Mercor, a $10B AI data labeling company serving Anthropic, OpenAI, and Meta, was also breached through the LiteLLM open-source library. Lapsus$ claims to have obtained 4TB of data.
>
> These incidents reinforce: securing your AI tool supply chain is not optional — it's a survival requirement.

> 🔄 Updated April 25, 2026
>
> Anthropic MCP SDK Design-Level RCE (Late April disclosure): Security researchers discovered systemic RCE vulnerabilities in Anthropic's official MCP SDK across Python, TypeScript, Java, and Rust, affecting over 7,000 publicly accessible MCP Servers and software packages totaling over 150 million downloads. Related CVEs include CVE-2025-49596 (MCP Inspector) and CVE-2026-22252 (LibreChat). Anthropic stated this behavior is "by design" and will not modify the protocol architecture. This makes the article's "lock down MCP supply chain" advice even more urgent — do not blindly trust the default behavior of official SDKs; add application-layer input validation and access controls.
>
> OpenClaw AI Agent RCE Crisis (CVE-2026-25253, CVSS 8.8): OpenClaw, with 135K GitHub Stars, disclosed a one-click RCE vulnerability and two command injection flaws. Worse, approximately 12% (341/2,857) of skills in its ClawHub marketplace were found to be compromised with malware. This is the first large-scale AI Agent supply chain poisoning event of 2026.
>
> LMDeploy SSRF Exploited in 13 Hours (CVE-2026-33626): An SSRF vulnerability in LMDeploy was exploited within just 13 hours of disclosure for cloud credential theft. The shrinking exploitation window is alarming — AI infrastructure vulnerability patching must be measured in hours, not days.

> 🔄 Updated 2026-04-27
>
> Flowise CVSS 10.0 Under Active Exploitation (CVE-2025-59528): Flowise's CustomMCP node contains a code injection vulnerability (maximum CVSS 10.0) allowing unauthenticated attackers to gain full system access via crafted requests. Despite a patch being available since September 2025 (v3.0.6), VulnCheck detected active exploitation starting April 7, with 12,000-15,000 instances still exposed. Attackers gain access to OpenAI, Anthropic, and other API keys plus all configured database credentials. This is the third actively exploited Flowise CVE.
>
> Windsurf IDE Zero-Click RCE (CVE-2026-30615): Among all IDEs affected by the MCP "design flaw," Windsurf is the most severe — attackers need zero user interaction. Simply loading a project with a malicious MCP configuration triggers code execution. Cursor, VS Code, Claude Code, and Gemini-CLI are also affected, but Windsurf is the only zero-click case.
>
> OX Security Full Disclosure (April 15): OX Security's detailed report confirmed the MCP architectural flaw affects over 200,000 servers and 150 million downloads, with 14 CVEs assigned. Anthropic's official response remains "expected behavior," declining to modify the protocol architecture. OX Security notes that a single protocol-level change (manifest-only execution or command allowlist) would instantly protect all downstream projects.
>
> Last updated: 2026-04-27

> 🔄 2026-05-25 Update
>
> EU AI Act high-risk compliance deferred: The May 7, 2026 Digital Omnibus agreement postponed the Annex III high-risk AI compliance deadline to December 2, 2027. Transparency obligations deferred to December 2, 2026. Prohibited AI practices (effective Feb 2025) and AI literacy requirements are unaffected.
>
> Last updated: 2026-05-25