Multi-agent systems (MAS) built on large language models (LLMs) have shown strong performance across many tasks. Most existing approaches improve only one aspect at a time, such as the communication topology, role assignment, or LLM routing, while treating each agent as a single, indivisible unit. This misses the opportunity to use mixtures of LLMs within an agent to strengthen role-specific abilities. We propose HieraMAS, a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology. HieraMAS introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure. Optimizing HieraMAS creates unique credit-assignment challenges: final task performance depends heavily on the underlying LLMs' capabilities, which can lead reinforcement methods to incorrectly reward suboptimal configurations. To address this, we use a two-stage algorithm: (1) multi-level reward attribution, which provides fine-grained feedback at both the node level and the overall system level; (2) graph classification for topology selection, which treats choosing the communication structure as a holistic decision rather than optimizing edges one by one. Experiments on reasoning and coding benchmarks show that HieraMAS substantially outperforms existing methods while also delivering better cost-performance trade-offs.
翻译:基于大语言模型(LLM)构建的多智能体系统(MAS)已在众多任务中展现出卓越性能。现有方法大多仅针对单一维度进行改进,例如通信拓扑结构、角色分配或LLM路由策略,而将每个智能体视为不可分割的单一单元。这种做法忽视了在智能体内部采用LLM混合策略以强化角色专属能力的潜在优势。本文提出HieraMAS——一个融合节点内LLM混合与节点间通信拓扑的层次化协作框架。该框架引入超级节点概念,每个功能角色通过提出-合成的结构由多个异构LLM协同实现。优化HieraMAS面临独特的信用分配挑战:最终任务性能高度依赖底层LLM的能力,这可能导致强化学习方法错误奖励次优配置。为解决此问题,我们采用两阶段算法:(1)多层级奖励归因机制,在节点层面和整体系统层面提供细粒度反馈;(2)面向拓扑选择的图分类方法,将通信结构选择视为整体决策而非逐边优化。在推理与代码生成基准测试上的实验表明,HieraMAS显著优于现有方法,同时实现了更优的性价比权衡。