Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning

The emergence of large language models (LLMs) has opened up unprecedented possibilities for automating complex tasks that are often comparable to human performance. Despite their capabilities, LLMs still encounter difficulties in completing tasks that require high levels of accuracy and complexity due to their inherent limitations in handling multifaceted problems single-handedly. This paper introduces "Smurfs", a cutting-edge multi-agent framework designed to revolutionize the application of LLMs. By transforming a conventional LLM into a synergistic multi-agent ensemble, Smurfs enhances task decomposition and execution without necessitating extra training. This is achieved through innovative prompting strategies that allocate distinct roles within the model, thereby facilitating collaboration among specialized agents. The framework gives access to external tools to efficiently solve complex tasks. Our empirical investigation, featuring the mistral-7b-instruct model as a case study, showcases Smurfs' superior capability in intricate tool utilization scenarios. Notably, Smurfs outmatches the ChatGPT-ReACT in the ToolBench I2 and I3 benchmark with a remarkable 84.4% win rate, surpassing the highest recorded performance of a GPT-4 model at 73.5%. Furthermore, through comprehensive ablation studies, we dissect the contribution of the core components of the multi-agent framework to its overall efficacy. This not only verifies the effectiveness of the framework, but also sets a route for future exploration of multi-agent LLM systems.

翻译：大型语言模型（LLMs）的出现为自动化复杂任务开辟了前所未有的可能性，这些任务往往可与人类表现相媲美。尽管LLMs具备强大的能力，但由于其在独立处理多层面问题上的固有限制，它们仍难以完成对高精度和高复杂度要求严苛的任务。本文介绍了"Smurfs"——一种前沿的多代理框架，旨在革新LLMs的应用方式。通过将传统的LLM转化为协同多代理集成系统，Smurfs在不增加额外训练的情况下提升了任务分解与执行能力。这一成果通过创新的提示策略实现，该策略在模型内部分配不同角色，从而促进专门化代理之间的协作。该框架支持访问外部工具以高效解决复杂任务。我们的实证研究以mistral-7b-instruct模型为例，展示了Smurfs在复杂工具使用场景中的卓越能力。尤为值得注意的是，在ToolBench I2和I3基准测试中，Smurfs以84.4%的胜率超越ChatGPT-ReACT系统，并超过了GPT-4模型此前73.5%的最高记录。此外，通过全面的消融研究，我们剖析了多代理框架核心组件对其整体效能的贡献。这不仅验证了该框架的有效性，也为未来探索多代理LLM系统指明了方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日