Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks are homogeneous, where all agents share the same base LLM and differ only in prompts, tools, and positions in the workflow. This raises the question of whether such workflows can be simulated by a single agent through multi-turn conversations. We investigate this across seven benchmarks spanning coding, mathematics, general question answering, domain-specific reasoning, and real-world planning and tool use. Our results show that a single agent can reach the performance of homogeneous workflows with an efficiency advantage from KV cache reuse, and can even match the performance of an automatically optimized heterogeneous workflow. Building on this finding, we propose \textbf{OneFlow}, an algorithm that automatically tailors workflows for single-agent execution, reducing inference costs compared to existing automatic multi-agent design frameworks without trading off accuracy. These results position the single-LLM implementation of multi-agent workflows as a strong baseline for MAS research. We also note that single-LLM methods cannot capture heterogeneous workflows due to the lack of KV cache sharing across different LLMs, highlighting future opportunities in developing \textit{truly} heterogeneous multi-agent systems.
翻译:近期基于大语言模型的多智能体系统研究表明,由具有不同角色、工具和通信模式的多个LLM智能体组成的工作流,在复杂任务上能够超越单LLM基线。然而,现有框架多为同构型,即所有智能体共享相同的基座LLM,仅通过提示词、工具及工作流中的位置进行区分。这引发了一个关键问题:此类工作流是否可通过单智能体进行多轮对话模拟实现?我们在涵盖代码生成、数学推理、通用问答、领域特定推理以及现实世界规划与工具使用的七项基准测试中对此展开研究。实验结果表明,单智能体凭借KV缓存复用带来的效率优势,能够达到同构工作流的性能水平,甚至可匹配自动优化的异构工作流性能。基于这一发现,我们提出\textbf{OneFlow}算法,该算法能自动将工作流适配为单智能体执行模式,在保持准确性的同时,相比现有自动多智能体设计框架显著降低推理成本。这些发现确立了多智能体工作流的单LLM实现方案作为MAS研究的强基线。同时我们指出,由于不同LLM间缺乏KV缓存共享机制,单LLM方法无法实现真正的异构工作流,这为开发\textit{真正}异构的多智能体系统指明了未来研究方向。