LLMs have shown the capacity to improve their performance on reasoning tasks through reflecting on their mistakes, and acting with these reflections in mind. However, continual reflections of the same LLM onto itself exhibit degeneration of thought, where the LLM continues to repeat the same errors again and again even with the knowledge that its wrong. To address this problem, we instead introduce multi-agent with multi-persona debators as the method to generate reflections. Through out extensive experimentation, we've found that the leads to better diversity of in the reflections generated by the llm agent. We demonstrate an accuracy of 47% EM HotPot QA (question answering) and 82.7% on HumanEval (programming), both performances surpassing reflection with a single llm.
翻译:大语言模型已展现出通过反思自身错误并据此调整行动来提升推理任务性能的能力。然而,同一语言模型对自身的持续反思会表现出思维退化现象,即即便意识到错误,模型仍会反复重复相同的问题。为解决这一问题,我们提出采用多智能体与多角色辩论者机制来生成反思。通过大量实验,我们发现该方法能有效提升语言模型所生成反思的多样性。在HotPot QA(问答任务)上,我们取得了47%的精确匹配率,在HumanEval(编程任务)上达到82.7%的准确率,两项指标均超越了单一语言模型的反思方法。