Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark

This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .

翻译：本文提出一种新的工具学习数据集Seal-Tools，包含自指令生成的API类工具。该数据集不仅提供大量工具，还包含展示工具实际应用场景的实例。为在大规模生成数据的同时确保可靠性，我们提出一种自指令方法来自动生成工具及其实例，从而实现对生成流程的精确控制。此外，Seal-Tools包含需要调用多个工具完成任务的困难实例，其中部分实例涉及嵌套工具调用。为实现精确全面的评估，我们采用严格格式控制，并从不同维度设计三项指标。因此，Seal-Tools可作为评估大语言模型工具调用能力的新基准。最后，我们在Seal-Tools上评估了多个主流大语言模型及我们微调后的模型，结果表明现有系统仍远未完善。代码、数据及实验结果请访问 https://github.com/fairyshine/Seal-Tools 。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日