Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of uncertainty, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies and six benchmark tasks. By analyzing 245 features spanning token-, trajectory-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3% of cases, and that uncertainty dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) Certainty Preference: reducing uncertainty at any stage for any agent is critical for guaranteeing correct solutions; 2) Base Uncertainty: base models with lower entropy during problem-solving directly benefit MAS performance; and 3) Task Awareness: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the Entropy Judger, to select solutions from MAS's pass@k results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at https://github.com/AgenticFinLab/multiagent-entropy.
翻译:多智能体系统已成为利用大语言模型处理复杂任务的重要范式。然而,基于公开可用大语言模型构建的多智能体系统其有效性背后的机制——特别是其成功或失败的根本原理——在很大程度上仍未得到探索。本文从不确定性的视角重新审视多智能体系统,通过研究在不同拓扑结构和六项基准任务中问题求解过程中的熵变,综合考虑智能体内部及智能体间的动态特性。通过分析涵盖令牌级、轨迹级和轮次级熵的245个特征,我们反直觉地发现:在约43.3%的情况下,单个智能体的表现优于多智能体系统,且不确定性动态主要在第一轮交互过程中决定。此外,我们提出三个关键观察:1)确定性偏好:在任何阶段降低任何智能体的不确定性对于保证正确解至关重要;2)基础不确定性:在问题求解过程中具有较低熵的基础模型直接有利于多智能体系统性能;3)任务感知:多智能体系统的熵动态在不同任务中扮演的角色各不相同。基于这些发现,我们提出一种简单而有效的算法——熵判定器,用于从多智能体系统的pass@k结果中选择解决方案,该算法在所有多智能体系统配置和任务中均实现了准确率的持续提升。我们的源代码公开于 https://github.com/AgenticFinLab/multiagent-entropy。