Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of uncertainty, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies and six benchmark tasks. By analyzing 245 features spanning token-, trajectory-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3% of cases, and that uncertainty dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) Certainty Preference: reducing uncertainty at any stage for any agent is critical for guaranteeing correct solutions; 2) Base Uncertainty: base models with lower entropy during problem-solving directly benefit MAS performance; and 3) Task Awareness: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the Entropy Judger, to select solutions from MAS's pass@k results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at https://github.com/AgenticFinLab/multiagent-entropy.
翻译:多智能体系统(MAS)已成为利用大语言模型(LLM)处理复杂任务的重要范式。然而,基于公开可用LLM构建的MAS其有效性背后的机制——特别是其成功或失败的根本原因——在很大程度上仍未得到探索。本文从不确定性的视角重新审视MAS,通过研究在不同拓扑结构和六个基准任务中问题求解过程中的熵变,同时考虑智能体内部及智能体间的动态。通过分析涵盖令牌级、轨迹级和轮次级熵的245个特征,我们反直觉地发现,单个智能体在大约43.3%的情况下优于MAS,并且不确定性动态很大程度上在首轮交互中就已确定。此外,我们提出了三个关键观察:1)确定性偏好:在任何阶段为任何智能体降低不确定性对于保证获得正确解至关重要;2)基础不确定性:在问题求解过程中具有较低熵的基础模型直接有益于MAS的性能;3)任务感知:MAS的熵动态在不同任务中扮演着不同的角色。基于这些洞见,我们引入了一种简单而有效的算法——熵判定器(Entropy Judger),用于从MAS的pass@k结果中选择解决方案,从而在所有MAS配置和任务中实现了一致的准确性提升。我们的源代码可在 https://github.com/AgenticFinLab/multiagent-entropy 获取。