LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free skill marketplaces already report 90368 published skills, while paid marketplaces report more than 2000 listings and over $100,000 in creator earnings. Yet this growing marketplace also creates a new attack surface, as adversaries can interact with public agent to extract hidden proprietary skill content. We present the first empirical study of black-box skill stealing against LLM agent systems. To study this threat, we first derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces diversity via embedding filtering. This process yields a reproducible pipeline for evaluating agent systems. We evaluate such attacks across 3 commercial agent architectures and 5 LLMs. Our results show that agent skills can be extracted with only 3 interactions, posing a serious copyright risk. To mitigate this threat, we design defenses across three stages of the agent pipeline: input, inference, and output. Although these defenses achieve strong results, the attack remains inexpensive and readily automatable, allowing an adversary to launch repeated attempts with different variants; only one successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks are largely overlooked across proprietary agent ecosystems. We therefore advocate for more robust defense strategies that provide stronger protection guarantees.
翻译:大模型智能体日益依赖技能来封装可复用能力,这些能力通过逐步公开的指令实现。高质量技能将专家知识注入通用模型,提升了在专业任务上的表现。这种质量与易于传播的特性催生了技能经济:免费技能市场已报告90368个已发布技能,而付费技能市场则显示超过2000个上架商品,创作者收益超10万美元。然而,这种不断增长的市场也创造了新的攻击面:攻击者可通过与公共智能体交互来提取受保护的隐藏技能内容。我们首次针对大模型智能体系统开展了黑盒技能窃取的实证研究。为研究此威胁,我们首先从先前提示词窃取方法中推导出攻击分类法,并构建了自动化窃取提示词生成智能体。该智能体从模型生成的种子提示词出发,通过场景合理化与结构注入进行扩展,并利用嵌入过滤确保多样性。该流程形成了可复现的智能体系统评估管道。我们在3个商业智能体架构与5种大语言模型上评估了此类攻击。结果表明,仅需3次交互即可提取智能体技能,构成了严重的版权风险。为缓解此威胁,我们设计了涵盖输入、推理与输出三个智能体阶段的防御措施。尽管这些防御取得了显著成效,但攻击成本低廉且易于自动化,使攻击者能够通过不同变体反复尝试;只需一次成功尝试即可破坏受保护技能。整体而言,我们的研究发现表明,在专有智能体生态系统中,这些版权风险普遍被忽视。因此我们呼吁采用更稳健的防御策略,以提供更强的保护保障。