By augmenting Large Language Models (LLMs) with external tools, their capacity to solve complex problems has been significantly enhanced. However, despite ongoing advancements in the parsing capabilities of LLMs, incorporating all available tools simultaneously in the prompt remains impractical due to the vast number of external tools. Consequently, it is essential to provide LLMs with a precise set of tools tailored to the specific task, considering both quantity and quality. Current tool retrieval methods primarily focus on refining the ranking list of tools and directly packaging a fixed number of top-ranked tools as the tool set. However, these approaches often fail to equip LLMs with the optimal set of tools prior to execution, since the optimal number of tools for different tasks could be different, resulting in inefficiencies such as redundant or unsuitable tools, which impede immediate access to the most relevant tools. This paper addresses the challenge of recommending precise toolsets for LLMs. We introduce the problem of tool recommendation, define its scope, and propose a novel Precision-driven Tool Recommendation (PTR) approach. PTR captures an initial, concise set of tools by leveraging historical tool bundle usage and dynamically adjusts the tool set by performing tool matching, culminating in a multi-view-based tool addition. Additionally, we present a new dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness of tool recommendation for LLMs. We further validate our design choices through comprehensive experiments, demonstrating promising accuracy across two open benchmarks and our RecTools dataset.
翻译:通过为大型语言模型(LLM)配备外部工具,其解决复杂问题的能力已得到显著提升。然而,尽管LLM的解析能力持续进步,由于外部工具数量庞大,在提示中同时纳入所有可用工具仍不切实际。因此,必须为LLM提供一套针对特定任务定制的精准工具集,同时兼顾数量与质量。当前的工具检索方法主要侧重于优化工具排序列表,并将固定数量的排名靠前工具直接打包为工具集。然而,由于不同任务所需的最优工具数量可能不同,这些方法往往无法在执行前为LLM配备最优工具集,导致冗余或不适用工具等低效问题,从而阻碍了对最相关工具的即时访问。本文致力于解决为LLM推荐精准工具集的挑战。我们提出了工具推荐问题,界定了其范围,并创新性地提出一种精准驱动的工具推荐(PTR)方法。PTR通过利用历史工具包使用记录获取初始精简工具集,并通过执行工具匹配动态调整工具集,最终实现基于多视角的工具增补。此外,我们构建了一个新数据集RecTools,并设计了一个评估指标TRACC,用于衡量LLM工具推荐的有效性。我们通过全面实验进一步验证了设计选择,在两个开放基准测试及我们的RecTools数据集上均展现出优异的准确率。