Multi-Agent Debate (MAD) improves LLM-agent accuracy but suffers from rapid context growth, limiting scalability in larger multi-agent settings. Existing methods prune low-utility communications using prior signals, such as token-level log-likelihoods or LLM self-reported confidence. However, these signals become unreliable under hallucination, degrading the accuracy of MAD methods that rely on them. We propose SVR-MAD, a Bayesian-inspired MAD framework that treats pre-debate signals as priors and debate outcomes as posterior-style evidence for estimating agent correctness. SVR-MAD uses this evidence to incrementally construct the communication graph, prioritizing agents whose answers survive peer challenges. Experiments across multiple LLMs and benchmarks show that SVR-MAD reduces token cost by up to 61% while matching or improving accuracy relative to the most accurate competing MAD baseline.
翻译:多智能体辩论(MAD)能够提升大语言模型智能体的准确率,但面临上下文快速增长的问题,限制了其在更大规模多智能体场景中的可扩展性。现有方法利用先验信号(如词元级对数似然或大语言模型自报告置信度)剪除低效用通信。然而,这些信号在幻觉情况下会变得不可靠,从而降低依赖它们的方法的准确率。我们提出SVR-MAD,一种贝叶斯启发的MAD框架,将辩论前信号视为先验,辩论结果作为估计智能体正确性的后验证据。SVR-MAD利用该证据增量式构建通信图,优先保留那些答案经受住同伴挑战的智能体。在多个大语言模型和基准测试上的实验表明,与最准确的竞品MAD基线相比,SVR-MAD可将词元成本降低高达61%,同时保持或提升准确率。