LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents; however, we find that such scaling exhibits strong diminishing returns in homogeneous settings, while introducing heterogeneity (e.g., different models, prompts, or tools) continues to yield substantial gains. This raises a fundamental question: what limits scaling, and why does diversity help? We present an information-theoretic framework showing that MAS performance is bounded by the intrinsic task uncertainty, not by agent count. We derive architecture-agnostic bounds demonstrating that improvements depend on how many effective channels the system accesses. Homogeneous agents saturate early because their outputs are strongly correlated, whereas heterogeneous agents contribute complementary evidence. We further introduce $K^*$, an effective channel count that quantifies the number of effective channels without ground-truth labels. Empirically, we show that heterogeneous configurations consistently outperform homogeneous scaling: 2 diverse agents can match or exceed the performance of 16 homogeneous agents. Our results provide principled guidelines for building efficient and robust MAS through diversity-aware design. Code and Dataset are available at the link: https://github.com/SafeRL-Lab/Agent-Scaling.
翻译:基于大型语言模型(LLM)的多智能体系统(MAS)已成为解决单个LLM难以处理的复杂任务的一种有前景的方法。一种自然的策略是通过增加智能体数量来扩展性能;然而,我们发现,在**同质化**设置中,这种扩展表现出强烈的收益递减效应,而引入**异质性**(例如,使用不同模型、提示词或工具)则能持续带来显著增益。这引出了一个根本性问题:是什么限制了扩展,以及多样性为何能带来帮助?我们提出了一个信息论框架,表明MAS的性能受限于**任务内在不确定性**,而非智能体数量。我们推导出与架构无关的边界,证明性能提升取决于系统能访问多少**有效通道**。同质化智能体之所以早期就达到饱和,是因为它们的输出具有强相关性,而异质化智能体则能提供互补的证据。我们进一步引入了 $K^*$,这是一个无需真实标签即可量化有效通道数量的指标。实验表明,异质化配置始终优于同质化扩展:2个多样化智能体的性能即可匹配甚至超过16个同质化智能体。我们的研究结果为通过多样性感知设计构建高效且稳健的MAS提供了原则性指导。代码和数据集可通过以下链接获取:https://github.com/SafeRL-Lab/Agent-Scaling。