工具文档普遍不足：简单文档扩展显著提升工具检索性能 (Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval)

Large Language Models (LLMs) have recently demonstrated strong capabilities in tool use, yet progress in tool retrieval remains hindered by incomplete and heterogeneous tool documentation. To address this challenge, we introduce Tool-DE, a new benchmark and framework that systematically enriches tool documentation with structured fields to enable more effective tool retrieval, together with two dedicated models, Tool-Embed and Tool-Rank. We design a scalable document expansion pipeline that leverages both open- and closed-source LLMs to generate, validate, and refine enriched tool profiles at low cost, producing large-scale corpora with 50k instances for embedding-based retrievers and 200k for rerankers. On top of this data, we develop two models specifically tailored for tool retrieval: Tool-Embed, a dense retriever, and Tool-Rank, an LLM-based reranker. Extensive experiments on ToolRet and Tool-DE demonstrate that document expansion substantially improves retrieval performance, with Tool-Embed and Tool-Rank achieving new state-of-the-art results on both benchmarks. We further analyze the contribution of individual fields to retrieval effectiveness, as well as the broader impact of document expansion on both training and evaluation. Overall, our findings highlight both the promise and limitations of LLM-driven document expansion, positioning Tool-DE, along with the proposed Tool-Embed and Tool-Rank, as a foundation for future research in tool retrieval.

翻译：大型语言模型（LLMs）近期在工具使用方面展现出强大能力，但工具检索的进展仍受限于不完整且异构的工具文档。为应对这一挑战，我们提出了Tool-DE，这是一个新的基准与框架，通过结构化字段系统性地丰富工具文档以实现更有效的工具检索，并配套开发了两个专用模型：Tool-Embed与Tool-Rank。我们设计了一个可扩展的文档扩展流程，利用开源与闭源LLMs以低成本生成、验证并优化增强后的工具配置文件，构建了包含5万条实例的嵌入检索器语料库及20万条重排序器语料库的大规模数据集。基于此数据，我们开发了两个专门针对工具检索的模型：稠密检索器Tool-Embed和基于LLM的重排序器Tool-Rank。在ToolRet和Tool-DE上的大量实验表明，文档扩展显著提升了检索性能，Tool-Embed与Tool-Rank在两个基准测试中均取得了新的最优结果。我们进一步分析了各字段对检索效果的贡献，以及文档扩展对训练与评估的广泛影响。总体而言，我们的研究结果既揭示了LLM驱动文档扩展的潜力，也指出了其局限性，使Tool-DE及所提出的Tool-Embed与Tool-Rank成为未来工具检索研究的基础。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日