The emergence of large language models (LLMs) has opened up unprecedented possibilities for automating complex tasks that are often comparable to human performance. Despite their capabilities, LLMs still encounter difficulties in completing tasks that require high levels of accuracy and complexity due to their inherent limitations in handling multifaceted problems single-handedly. This paper introduces "Smurfs", a cutting-edge multi-agent framework designed to revolutionize the application of LLMs. By transforming a conventional LLM into a synergistic multi-agent ensemble, Smurfs enhances task decomposition and execution without necessitating extra training. This is achieved through innovative prompting strategies that allocate distinct roles within the model, thereby facilitating collaboration among specialized agents. The framework gives access to external tools to efficiently solve complex tasks. Our empirical investigation, featuring the mistral-7b-instruct model as a case study, showcases Smurfs' superior capability in intricate tool utilization scenarios. Notably, Smurfs outmatches the ChatGPT-ReACT in the ToolBench I2 and I3 benchmark with a remarkable 84.4% win rate, surpassing the highest recorded performance of a GPT-4 model at 73.5%. Furthermore, through comprehensive ablation studies, we dissect the contribution of the core components of the multi-agent framework to its overall efficacy. This not only verifies the effectiveness of the framework, but also sets a route for future exploration of multi-agent LLM systems.
翻译:大型语言模型(LLMs)的出现为自动化复杂任务开辟了前所未有的可能性,这些任务往往可与人类表现相媲美。尽管LLMs具备强大的能力,但由于其在独立处理多层面问题上的固有限制,它们仍难以完成对高精度和高复杂度要求严苛的任务。本文介绍了"Smurfs"——一种前沿的多代理框架,旨在革新LLMs的应用方式。通过将传统的LLM转化为协同多代理集成系统,Smurfs在不增加额外训练的情况下提升了任务分解与执行能力。这一成果通过创新的提示策略实现,该策略在模型内部分配不同角色,从而促进专门化代理之间的协作。该框架支持访问外部工具以高效解决复杂任务。我们的实证研究以mistral-7b-instruct模型为例,展示了Smurfs在复杂工具使用场景中的卓越能力。尤为值得注意的是,在ToolBench I2和I3基准测试中,Smurfs以84.4%的胜率超越ChatGPT-ReACT系统,并超过了GPT-4模型此前73.5%的最高记录。此外,通过全面的消融研究,我们剖析了多代理框架核心组件对其整体效能的贡献。这不仅验证了该框架的有效性,也为未来探索多代理LLM系统指明了方向。