Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion.
翻译:近期大语言模型讨论的进展表明,多智能体讨论可提升大语言模型的推理能力。本研究通过系统性实验重新评估这一论断,并提出一种新颖的群体讨论框架以丰富讨论机制集合。有趣的是,研究结果显示,在多种推理任务与骨干大语言模型中,配备强提示策略的单智能体大语言模型几乎能取得与现有最优讨论方法相当的性能。我们观察到仅当提示中不含示例时,多智能体讨论才优于单智能体表现。进一步研究揭示了讨论过程中大语言模型的通用交互机制。