While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
翻译:尽管基于大语言模型(LLM)的多智能体系统通过迭代辩论实现了卓越的推理性能,但其高昂的计算成本和误差传播问题限制了实际部署。本文提出AgentArk,一种新颖的框架,将多智能体动态蒸馏至单个模型的权重中,从而将显式的测试时交互转化为隐式的模型能力。这使得单个智能体能够具备多智能体系统的智能,同时保持计算高效性。具体而言,我们在不同模型、任务、规模及场景下研究了三种层次化蒸馏策略:推理增强微调、基于轨迹的数据增强以及过程感知蒸馏。通过将计算负担从推理阶段转移至训练阶段,蒸馏后的模型在保持单智能体效率的同时,展现出多智能体系统强大的推理与自我修正性能。这些模型在多样化推理任务中进一步表现出增强的鲁棒性和泛化能力。我们希望这项工作能为未来高效且鲁棒的多智能体开发研究提供启示。代码发布于 https://github.com/AIFrontierLab/AgentArk。