LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.
翻译:LLM智能体正在快速发展,其能力由代码执行、工具以及最近引入的智能体技能特性所驱动。技能允许用户通过专门的第三方代码、知识和指令来扩展LLM应用。尽管这能将智能体的能力扩展到新领域,但也构建了日益复杂的智能体供应链,为提示注入攻击提供了新的攻击面。我们将基于技能的提示注入识别为一项重大威胁,并引入SkillInject基准,用于评估广泛使用的LLM智能体通过技能文件遭受注入攻击的敏感性。SkillInject包含202个注入-任务对,其攻击范围从明显恶意的注入,到隐藏在看似合法指令中的、依赖于上下文的微妙攻击。我们在SkillInject上评估前沿LLM,既衡量其在避免有害指令方面的安全性,也衡量其在遵守合法指令方面的实用性。我们的结果表明,当今的智能体高度脆弱,使用前沿模型时攻击成功率高达80%,经常执行包括数据窃取、破坏性行为以及类似勒索软件行为在内的极端有害指令。结果进一步表明,这一问题无法通过模型缩放或简单的输入过滤来解决,稳健的智能体安全将需要具备上下文感知能力的授权框架。我们的基准测试可在 https://www.skill-inject.com/ 获取。