Balancing individual specialisation and shared behaviours is a critical challenge in multi-agent reinforcement learning (MARL). Existing methods typically focus on encouraging diversity or leveraging shared representations. Full parameter sharing (FuPS) improves sample efficiency but struggles to learn diverse behaviours when required, while no parameter sharing (NoPS) enables diversity but is computationally expensive and sample inefficient. To address these challenges, we introduce HyperMARL, a novel approach using hypernetworks to balance efficiency and specialisation. HyperMARL generates agent-specific actor and critic parameters, enabling agents to adaptively exhibit diverse or homogeneous behaviours as needed, without modifying the learning objective or requiring prior knowledge of the optimal diversity. Furthermore, HyperMARL decouples agent-specific and state-based gradients, which empirically correlates with reduced policy gradient variance, potentially offering insights into its ability to capture diverse behaviours. Across MARL benchmarks requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL consistently matches or outperforms FuPS, NoPS, and diversity-focused methods, achieving NoPS-level diversity with a shared architecture. These results highlight the potential of hypernetworks as a versatile approach to the trade-off between specialisation and shared behaviours in MARL.
翻译:在多智能体强化学习(MARL)中,平衡个体专业化与共享行为是一个关键挑战。现有方法通常侧重于鼓励多样性或利用共享表征。全参数共享(FuPS)提高了样本效率,但在需要学习多样化行为时表现不佳;而无参数共享(NoPS)虽能实现多样性,却计算成本高昂且样本效率低下。为应对这些挑战,我们提出了HyperMARL,一种利用超网络平衡效率与专业化的新方法。HyperMARL生成智能体特定的行动者和评论者参数,使智能体能够根据需要自适应地表现出多样化或同质化行为,而无需修改学习目标或预先获知最优多样性。此外,HyperMARL解耦了智能体特定梯度与基于状态的梯度,经验表明这与策略梯度方差的降低相关,可能为理解其捕获多样化行为的能力提供启示。在需要同质、异质或混合行为的MARL基准测试中,HyperMARL始终匹配或优于FuPS、NoPS以及专注于多样性的方法,以共享架构实现了NoPS级别的多样性。这些结果凸显了超网络作为平衡MARL中专业化与共享行为的一种通用方法的潜力。