LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's 'imagination' to simulate plausible scenarios for using a tool, after which the LLM interacts with the tool to learn from its execution feedback. Both short-term and long-term memory are employed to improve the depth and breadth of the exploration, respectively. Comprehensive experiments on ToolBench show that STE substantially improves tool learning for LLMs under both in-context learning and fine-tuning settings, bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform GPT-4. We also show effective continual learning of tools via a simple experience replay strategy.

翻译：工具对于大型语言模型（LLMs）获取实时信息并在外部环境中执行关键行动至关重要。现有关于工具增强型LLM的研究主要关注工具的广泛覆盖性和新增工具的灵活性。然而，一个令人惊讶地被忽视的关键问题在于：LLM在训练过的工具上究竟能实现多高的准确率？我们发现，包括GPT-4和专门针对工具使用进行微调的开源LLM在内，现有模型的正确率仅处于30%至60%区间，远未达到实际应用的可靠性要求。受生物系统启发，我们提出一种针对工具增强型LLM的方法——模拟试错（STE）——该方法协调了生物系统中成功使用工具行为的三种关键机制：试错、想象与记忆。具体而言，STE利用LLM的“想象”能力生成使用工具的合理场景，随后通过LLM与工具的交互学习执行反馈。短期与长期记忆分别用于提升探索的深度与广度。在ToolBench上的综合实验表明，STE在上下文学习和微调两种设置下均能显著提升LLM的工具学习能力，使Mistral-Instruct-7B的性能提升46.7%，并使其超越GPT-4。我们还通过简单的经验回放策略展示了工具的有效持续学习能力。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日