Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, this paper investigates the intriguing potential of tools to augment LLMs in handling such complexity. To this end, we design customized tools to aid in the proactive exploration within these massive environments. Such tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with these tools, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in complex real-world applications.

翻译：大型语言模型（LLM）的应用已远远超出文本处理的范畴，标志着新时代的到来——LLM被设想为能够在复杂现实环境中运行的通才语言代理。这些环境往往高度广阔，使得LLM无法在其短期记忆内处理。受近期关于通过工具扩展LLM能力的研究启发，本文探讨了工具增强LLM处理此类复杂性的潜在可能性。为此，我们设计了定制化工具，以辅助在这些庞大环境中进行主动探索。这类工具可作为中间件层，将LLM与环境的复杂性隔离开来。在两个代表性的复杂环境——知识库（KB）和数据库中，我们展示了在复杂环境中使用工具增强语言代理的巨大潜力。值得注意的是，配备这些工具后，GPT-4在需要访问数据库内容的任务中性能达到最佳基线的2.8倍，在KB任务中达到2.2倍。我们的发现为在复杂现实应用中推进语言代理的发展指明了道路。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日