Multi-agent interactions between Large Language Model (LLM) agents have shown major improvements on diverse reasoning tasks. However, these involve long generations from multiple models across several rounds, making them expensive. Moreover, these multi-agent approaches fail to provide a final, single model for efficient inference. To address this, we introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple LLMs into smaller LMs. MAGDi teaches smaller models by representing multi-agent interactions as graphs, augmenting a base student model with a graph encoder, and distilling knowledge using three objective functions: next-token prediction, a contrastive loss between correct and incorrect reasoning, and a graph-based objective to model the interaction structure. Experiments on seven widely-used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers. Moreover, MAGDi also demonstrates an order of magnitude higher efficiency over its teachers. We conduct extensive analyses to show that MAGDi (1) enhances the generalizability to out-of-domain tasks, (2) scales positively with the size and strength of the base student model, and (3) obtains larger improvements (via our multi-teacher training) when applying self-consistency - an inference technique that relies on model diversity.
翻译:大型语言模型(LLM)智能体之间的多智能体交互在多种推理任务中取得了显著改进。然而,这些交互涉及多个模型在多轮对话中的长序列生成,成本高昂。此外,此类多智能体方法无法提供用于高效推理的最终单一模型。为解决这一问题,我们提出MAGDi——一种将多个LLM间的推理交互结构化蒸馏至更小语言模型的新方法。MAGDi通过将多智能体交互表示为图结构来训练小型模型:为基础学生模型附加图编码器,并利用三个目标函数蒸馏知识:下一词预测、正确与错误推理之间的对比损失,以及建模交互结构的基于图的目标函数。在七个广泛使用的常识推理与数学推理基准上的实验表明,MAGDi提升了小型模型的推理能力,性能优于从单一教师及多教师蒸馏的多种方法。此外,MAGDi在其教师模型上展现出一个数量级的效率提升。我们通过大量分析证明MAGDi:(1)增强了跨领域任务的泛化能力;(2)与学生基础模型的大小和强度呈正向扩展关系;(3)在应用依赖于模型多样性的推理技术——自一致性(self-consistency)时,获得更大性能提升。