LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.
翻译:LLM智能体在代码执行、工具调用以及近期引入的智能体技能特性的推动下正快速发展。技能允许用户通过专门的第三方代码、知识和指令来扩展LLM应用。尽管这能将智能体能力拓展至新领域,但也形成了日益复杂的智能体供应链,为提示注入攻击提供了新的攻击面。我们将基于技能的提示注入识别为一项重大威胁,并推出SkillInject基准测试,用于评估主流LLM智能体通过技能文件遭受注入攻击的脆弱性。SkillInject包含202个注入-任务对,其攻击范围涵盖从明显恶意的注入到隐藏在合法指令中的、依赖上下文的隐蔽攻击。我们在SkillInject上对前沿LLM进行评估,分别从有害指令规避(安全性)和合法指令遵循(实用性)两个维度进行测量。实验结果表明,当前智能体具有高度脆弱性,在前沿模型上的攻击成功率高达80%,常会执行包括数据窃取、破坏性操作和类勒索软件行为在内的极端有害指令。研究进一步表明,该问题无法通过模型缩放或简单输入过滤解决,鲁棒的智能体安全需要具备上下文感知能力的授权框架。我们的基准测试已发布于 https://www.skill-inject.com/。