The burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.
翻译:基于LLM的多智能体系统(MAS)这一新兴领域有望通过协作智能解决复杂任务,然而关于其规模行为和内在集体动力学的基本问题仍未得到充分探索。本文系统研究了同构MAS的性能如何随智能体数量增加而变化,将协作变量与模型或知识异质性分离。我们提出了序列迭代多智能体系统(SIMAS)框架,这是一种以顺序智能体间通信为核心的极简架构,旨在清晰观察规模效应。通过在不同任务和模型规模下进行大量实验,我们证明MAS性能并非随智能体数量单调增长,而是遵循收益递减模式,其受协作协同与协调开销之间的权衡所支配。我们的发现表明,有效的MAS需要足够强大的基础LLM,任务类型关键性地调节最优智能体数量,而集体智能是一种依赖于策略性交互设计而非单纯智能体数量保证的涌现特性。性能退化源于协调开销而非仅仅是长上下文失败,且这一规模趋势可推广至结构化辩论拓扑等交互架构。本研究为MAS规模法则提供了基础性理解,为设计高效协作系统提供了实践指导,并挑战了“更多智能体必然带来更好性能”的普遍假设。