Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks. Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models, frequently referred to as multi-agent debate (MAD). While debate shows promise as a means of improving model efficacy, most works in this area treat debate as an emergent behavior, rather than a learned behavior. In doing so, current debate frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models. To address this limitation, we propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate. We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.
翻译:大型语言模型(LLM)已展现出作为通用工具处理各类语言任务的卓越能力。近期研究表明,通过多个模型之间的迭代对话——常被称为多智能体辩论(MAD),可以进一步提升此类模型的效果。尽管辩论作为提升模型效能的手段展现出潜力,但该领域多数研究将辩论视为一种涌现行为而非习得行为。现有辩论框架因此依赖于现成模型中已充分训练的合作行为。为突破此限制,我们提出ACC-Debate——一种基于行动者-评论家(Actor-Critic)的学习框架,旨在训练专门从事辩论的双智能体团队。实验表明,ACC-Debate在广泛基准测试中均优于当前最先进的辩论技术。