Skills are a key enabling component of agentic AI. While they enhance agents' capabilities, they also introduce new attack surfaces. In this work, we investigate one such attack surface by demonstrating dynamic malicious skills. By embedding malicious instructions in natural-language documentation (e.g., SKILL.md), an attacker can induce an agent to dynamically inject malicious logic into an otherwise benign skill during execution. We evaluate this attack across agentic frameworks such as OpenHands and Claude Code, showing that dynamic malicious skills can successfully introduce a range of malicious behaviors at runtime with non-trivial success rates. To mitigate this vulnerability, we propose a system-level defense that prevents dynamic modification of skills using operating system kernel-enforced read-only mounts. Our evaluation demonstrates that this defense effectively blocks dynamic malicious skills while preserving the functionality of benign skills.
翻译:技能是Agentic AI的关键赋能组件。虽然它们增强了智能体的能力,但也引入了新的攻击面。在本研究中,我们通过展示动态恶意技能来探究其中一种攻击面。攻击者通过将恶意指令嵌入自然语言文档(例如SKILL.md),可诱导智能体在运行时将恶意逻辑动态注入原本良性的技能中。我们在OpenHands和Claude Code等Agentic框架上评估了该攻击,表明动态恶意技能能够以非平凡的成功率在运行时引入一系列恶意行为。为缓解此漏洞,我们提出了一种系统级防御方案,利用操作系统内核强制开启的只读挂载来阻止技能的动态修改。评估结果表明,该防御有效阻断了动态恶意技能,同时保持了良性技能的功能完整性。