Large language models (LLMs) have demonstrated remarkable capabilities in language generation, understanding, and few-shot learning in recent years. An extensive body of work has explored how their performance may be further improved through the tools of prompting, ranging from verification, self-consistency, or intermediate scratchpads. In this paper, we present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks. We also demonstrate that our approach improves the factual validity of generated content, reducing fallacious answers and hallucinations that contemporary models are prone to. Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate. Overall, our findings suggest that such "society of minds" approach has the potential to significantly advance the capabilities of LLMs and pave the way for further breakthroughs in language generation and understanding.
翻译:大型语言模型(LLMs)近年来在语言生成、理解及少样本学习方面展现出卓越能力。大量研究工作探索了如何通过提示工具(包括验证、自一致性或中间草稿板)进一步提升其性能。本文提出一种互补性方法来改进语言响应:多个语言模型实例在多轮交互中提出并辩论各自的响应与推理过程,最终达成统一的最终答案。研究结果表明,该方法在多项任务中显著提升了数学与策略推理能力。我们还证明,该方法能提高生成内容的事实有效性,减少当代模型易产生的错误答案与幻觉。该方法可直接应用于现有黑箱模型,并在所有研究任务中使用相同的程序与提示。总体而言,我们的发现表明,这种"思维社会"方法有望显著提升LLMs的能力,为语言生成与理解的进一步突破铺平道路。