This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .
翻译:本文提出一种新的工具学习数据集Seal-Tools,包含自指令生成的API类工具。该数据集不仅提供大量工具,还包含展示工具实际应用场景的实例。为在大规模生成数据的同时确保可靠性,我们提出一种自指令方法来自动生成工具及其实例,从而实现对生成流程的精确控制。此外,Seal-Tools包含需要调用多个工具完成任务的困难实例,其中部分实例涉及嵌套工具调用。为实现精确全面的评估,我们采用严格格式控制,并从不同维度设计三项指标。因此,Seal-Tools可作为评估大语言模型工具调用能力的新基准。最后,我们在Seal-Tools上评估了多个主流大语言模型及我们微调后的模型,结果表明现有系统仍远未完善。代码、数据及实验结果请访问 https://github.com/fairyshine/Seal-Tools 。