基于工具调用视角的编码智能体红队测试：一项实证安全评估 (Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment)

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

翻译：由大型语言模型驱动的编码智能体正逐渐成为现代集成开发环境（IDE）的核心模块，通过调用工具协助用户完成复杂任务。尽管功能强大，工具调用却引入了显著的攻击面。先前研究已展示针对通用及领域专用智能体的攻击，但尚未有工作聚焦于编码智能体中工具调用的安全风险。为填补这一空白，我们对六款流行的真实世界编码智能体进行了首次系统性红队测试：Cursor、Claude Code、Copilot、Windsurf、Cline 和 Trae。我们的红队测试分为两个阶段。在第一阶段，我们执行提示词泄露侦察以恢复系统提示词，发现了一个通用漏洞 ToolLeak，该漏洞允许攻击者通过工具调用期间的良性参数检索实现恶意提示词窃取。在第二阶段，我们通过在工具描述和返回值中注入新颖的双通道提示词，劫持智能体的工具调用行为，实现了远程代码执行（RCE）。我们利用第一阶段泄露的安全信息自适应构建攻击载荷。在五种后端模型的仿真测试中，我们的方法在 Claude-Sonnet-4、Claude-Sonnet-4.5、Grok-4 和 GPT-5 上均优于基线方法。在实际智能体测试中，我们的方法在 25 组智能体-大语言模型配对中成功入侵了 19 组，并在所有使用 Claude 和 Grok 后端的智能体上实现了信息泄露。针对工具调用劫持，我们在所有测试的智能体-大语言模型配对上均获得了 RCE 权限，其中双通道注入方法的成功率最高。我们提供了针对 Cursor 和 Claude Code 的案例分析，探讨了外部工具与内置工具的安全防护机制，并最终提出了切实可行的防御建议。