Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System

Multi-agent debate system (MAD) imitating the process of human discussion in pursuit of truth, aims to align the correct cognition of different agents for the optimal solution. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds (i.e., cognitive islands), which hinders the search for the optimal solution. To address the challenge, we propose a novel \underline{M}ulti-\underline{A}gent \underline{D}ebate with \underline{K}nowledge-\underline{E}nhanced framework (\textbf{MADKE}) to promote the system to find the solution. First, we involve a shared retrieval knowledge pool in the debate process to solve the problem of limited and different knowledge backgrounds. Then, we propose an adaptive knowledge selection method to guarantee the accuracy and personalization of knowledge. This method allows agents to choose whether to use external knowledge in each conversation round according to their own needs. Our experimental results on six datasets show that our method achieves state-of-the-art results compared to existing single-agent and multi-agent methods. Further analysis reveals that the introduction of retrieval knowledge can help the agent to break cognitive islands in the debate process and effectively improve the consistency and correctness of the model. Moreover, MADKE using Qwen1.5-72B-Chat surpasses GPT-4 by +1.26\% on average in six datasets, which validates that our method can help open-source LLMs achieve or even surpass the performance of GPT-4. Our code is available at \url{https://github.com/FutureForMe/MADKE}.

翻译：多智能体辩论系统（MAD）通过模拟人类追求真理的讨论过程，旨在协调不同智能体的正确认知以达成最优解。由于各智能体知识背景有限且存在差异（即认知孤岛），使其产生正确且高度一致的认知具有挑战性，这阻碍了对最优解的探寻。为解决这一难题，我们提出了一种新颖的、知识增强的多智能体辩论框架（MADKE），以促进系统寻找解决方案。首先，我们在辩论过程中引入一个共享的检索知识池，以解决知识背景有限且差异性的问题。其次，我们提出了一种自适应的知识选择方法，以确保知识的准确性和个性化。该方法允许智能体在每一轮对话中根据自身需求选择是否使用外部知识。我们在六个数据集上的实验结果表明，与现有的单智能体和多智能体方法相比，我们的方法取得了最先进的性能。进一步分析表明，检索知识的引入能够帮助智能体在辩论过程中打破认知孤岛，有效提升模型的一致性和正确性。此外，使用Qwen1.5-72B-Chat的MADKE在六个数据集上平均超越GPT-4达+1.26%，这验证了我们的方法能够帮助开源大语言模型达到甚至超越GPT-4的性能。我们的代码公开于 \url{https://github.com/FutureForMe/MADKE}。