LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.
翻译:基于大语言模型的代码助手通过开放市场分发第三方技能以扩展其能力,这些市场不强制要求安全审查。与传统的软件包不同,此类技能以具有系统级权限的操作指令形式执行,因此一个恶意技能即可危及整个宿主系统。现有研究尚未探讨在现有防护机制下,供应链攻击是否可以直接劫持助手的动作空间(如文件写入、Shell命令执行及网络请求)。我们提出文档驱动的隐式载荷执行(DDIPE)技术,将恶意逻辑嵌入技能文档中的代码示例和配置模板。由于智能体在常规任务中会复用这些示例,无需显式提示即可执行载荷。通过LLM驱动的流水线,我们从15个MITRE ATTACK类别的81个种子生成了1,070个对抗性技能。在四个框架和五个模型上的测试表明,DDIPE的绕过率为11.6%至33.5%,而显式指令攻击在强防御下的成功率为0%。静态分析能检测大部分案例,但仍有2.5%的技能同时规避了检测与对齐机制。负责任的披露导致四个已确认的漏洞和两个修复方案。