Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; existing attacks are also largely hand-crafted. We propose the first automated framework for stealthy prompt injection tailored to agent skills. The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills in a realistic tool environment, and an Evaluate Agent that logs action traces (e.g., tool calls and file operations) and verifies whether targeted malicious behaviors occurred. We also propose a malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution. Extensive experiments across diverse coding-agent settings and real-world software engineering tasks show that our method consistently achieves high attack success rates under realistic settings.
翻译:在编码代理中,技能正成为核心抽象,通过封装长格式指令与辅助脚本来扩展工具增强行为。该抽象引入了一个尚未被充分评估的攻击面:基于技能的提示注入,其中被篡改的技能可能使代理偏离用户意图与安全策略。实践中,简单的注入常因恶意意图过于显式或与原技能偏离过大而失败,导致代理忽略或拒绝执行;现有攻击也多为手工构建。我们提出了首个面向代理技能的隐式提示注入自动化框架。该框架构建了一个由三个代理组成的闭环系统:攻击代理在显式隐蔽约束下合成注入技能,编码代理在真实工具环境中使用注入技能执行任务,评估代理记录动作轨迹(如工具调用与文件操作)并验证目标恶意行为是否发生。我们还提出了一种恶意载荷隐藏策略,将对抗性操作隐藏在辅助脚本中,同时注入经优化的诱导提示以触发工具执行。通过对多样化编码代理场景与真实世界软件工程任务的大规模实验表明,我们的方法在现实环境下始终能实现较高的攻击成功率。