Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals that our proposed uncertainty quantification reliably indicates system failures, which demonstrates the validity of using them as diagnostic metrics to indicate the system failure. Subsequently, we propose a mitigation strategy by formulating an uncertainty-driven policy optimization to penalize self-contradiction, peer conflict, and low-confidence outputs in a dynamic debating environment. Experiments demonstrate that our proposed uncertainty-driven mitigation reliably calibrates the multi-agent system by consistently improving decision accuracy while reducing system disagreement.
翻译:多智能体辩论系统通过迭代式商议提升大语言模型的推理能力,但其仍易受辩论崩溃的影响——这种失效模式表现为最终智能体决策基于错误推理而受损。现有方法缺乏检测或预防此类失效的原则性机制。为填补这一空白,我们首先提出一种分层度量指标,在三个层面量化行为不确定性:智能体内不确定性(个体推理不确定性)、智能体间不确定性(交互不确定性)以及系统级不确定性(输出不确定性)。在多个基准测试上的实证分析表明,我们提出的不确定性量化方法能可靠地指示系统失效,这验证了将其作为诊断指标以指示系统失效的有效性。随后,我们提出一种缓解策略,通过构建不确定性驱动的策略优化,在动态辩论环境中对自相矛盾、同伴冲突及低置信度输出进行惩罚。实验证明,我们提出的不确定性驱动缓解方法能可靠地校准多智能体系统,在持续提升决策准确率的同时降低系统分歧。