As the development of Large Language Models (LLMs) shifts from parameter scaling to inference-time collaboration, the Mixture-of-Agents (MoA) framework has emerged as a general paradigm to harness collective intelligence by layering diverse models. While recent MoA variants have introduced dynamic routing and residual connections to improve efficiency, these methods often fail to facilitate deep semantic interaction between agents, limiting the system's ability to actively correct hallucinations and refine logic. In this paper, we introduce Attention-MoA, a novel MoA-based framework that redefines collaboration through Inter-agent Semantic Attention. Complemented by an Inter-layer Residual Module with Adaptive Early Stopping Mechanism, our architecture mitigates information degradation in deep layers while improving computational efficiency. Extensive evaluations across AlpacaEval 2.0, MT-Bench, and FLASK demonstrate that Attention-MoA significantly outperforms state-of-the-art baselines, achieving a 91.15% Length-Controlled Win Rate on AlpacaEval 2.0 and dominating in 10 out of 12 capabilities on FLASK. Notably, Attention-MoA enables an ensemble of small open-source models to outperform massive proprietary models like Claude-4.5-Sonnet and GPT-4.1, achieving an MT-Bench score of 8.83 and an AlpacaEval 2.0 LC Win Rate of 77.36%.
翻译:随着大型语言模型(LLM)的发展重点从参数扩展转向推理时协作,智能体混合(MoA)框架已成为通过分层整合多样化模型来利用集体智能的通用范式。尽管近期MoA变体引入了动态路由和残差连接以提升效率,这些方法往往未能促进智能体间的深层语义交互,限制了系统主动纠正幻觉和优化逻辑的能力。本文提出Attention-MoA——一种基于MoA的新型框架,通过智能体间语义注意力重新定义协作机制。结合具备自适应早停机制的层间残差模块,该架构在提升计算效率的同时缓解了深层信息退化问题。在AlpacaEval 2.0、MT-Bench和FLASK上的大量实验表明,Attention-MoA显著优于现有基线方法,在AlpacaEval 2.0上获得91.15%的长度控制胜率,并在FLASK的12项能力评估中主导其中10项。值得注意的是,Attention-MoA使开源小模型组合能够超越Claude-4.5-Sonnet和GPT-4.1等大型专有模型,在MT-Bench获得8.83分,在AlpacaEval 2.0上达到77.36%的LC胜率。