This paper proposes a group deliberation oriented multi-agent conversational model to address the limitations of single large language models in complex reasoning tasks. The model adopts a three-level role division architecture consisting of generation, verification, and integration. An opinion generation agent produces diverse reasoning perspectives, an evidence verification agent retrieves external knowledge and quantifies factual support, and a consistency arbitration agent integrates logically coherent conclusions. A self-game mechanism is introduced to expand multi-path reasoning trajectories, while a retrieval enhancement module dynamically supplements external knowledge. A composite reward function combining factual consistency and logical coherence is designed, and an improved proximal policy optimization strategy is applied for collaborative training. Experimental results show that the proposed model improves multi-hop reasoning accuracy by 16.8 percent on HotpotQA, 14.3 percent on 2WikiMultihopQA, and 19.2 percent on MeetingBank, while improving consistency by 21.5 percent. The model achieves higher reasoning efficiency than mainstream multi-agent approaches, providing an effective and stable solution for complex reasoning tasks.
翻译:本文提出一种面向群体审议的多智能体对话模型,以解决单一大型语言模型在复杂推理任务中的局限性。该模型采用生成、验证与整合的三级角色划分架构:观点生成智能体产生多样化的推理视角,证据验证智能体检索外部知识并量化事实支持度,一致性仲裁智能体整合逻辑自洽的结论。通过引入自博弈机制扩展多路径推理轨迹,同时采用检索增强模块动态补充外部知识。设计了融合事实一致性与逻辑连贯性的复合奖励函数,并应用改进的近端策略优化策略进行协同训练。实验结果表明,所提模型在HotpotQA上的多跳推理准确率提升16.8%,在2WikiMultihopQA上提升14.3%,在MeetingBank上提升19.2%,同时将一致性指标提升21.5%。该模型相比主流多智能体方法具有更高的推理效率,为复杂推理任务提供了有效且稳定的解决方案。