ToolTweak：针对基于LLM的智能体中工具选择机制的攻击 (ToolTweak: An Attack on Tool Selection in LLM-based Agents)

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.

翻译：随着大语言模型日益驱动着与外部工具交互的智能体，工具使用已成为扩展其能力的关键机制。这些智能体通常从不断增长的数据库或市场中选取工具以解决用户任务，这导致工具提供者和开发者之间在可见性与使用率上形成了隐性竞争。本文揭示了该选择过程中存在一个关键漏洞：通过迭代式操纵工具名称与描述，攻击者能够系统性地诱导智能体偏向选择特定工具，从而在与能力相当的其他工具竞争中获取不公平优势。我们提出了ToolTweak——一种轻量级自动攻击方法，可将工具被选率从约20%的基线提升至最高81%，并在开源与闭源模型间展现出强大的迁移性。除针对单个工具外，我们证明此类攻击会导致工具使用分布的偏移，揭示了新兴工具生态系统中在公平性、竞争性与安全性方面存在的风险。为缓解这些风险，我们评估了两种防御方案：释义重写与困惑度过滤，它们能有效降低选择偏差，促使智能体更公平地选择功能相近的工具。所有代码将在论文录用后开源。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日