Adaptive cooperation in multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours, yet achieving this adaptivity remains a critical challenge. While parameter sharing (PS) is standard for efficient learning, it notoriously suppresses the behavioural diversity required for specialisation. This failure is largely due to cross-agent gradient interference, a problem we find is surprisingly exacerbated by the common practice of coupling agent IDs with observations. Existing remedies typically add complexity through altered objectives, manual preset diversity levels, or sequential updates -- raising a fundamental question: can shared policies adapt without these intricacies? We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and decouple observation- and agent-conditioned gradients, directly countering the interference from coupling agent IDs with observations. Our resulting method, HyperMARL, avoids the complexities of prior work and empirically reduces policy gradient variance. Across diverse MARL benchmarks (22 scenarios, up to 30 agents), HyperMARL achieves performance competitive with six key baselines while preserving behavioural diversity comparable to non-parameter sharing methods, establishing it as a versatile and principled approach for adaptive MARL. The code is publicly available at https://github.com/KaleabTessera/HyperMARL.
翻译:在多智能体强化学习(MARL)中,自适应协作要求策略能够表达同质化、专业化或混合行为,但实现这种自适应性仍是一个关键挑战。虽然参数共享(PS)是高效学习的标准方法,但它显著抑制了专业化所需的行为多样性。这一失败主要源于跨智能体梯度干扰,我们发现,将智能体ID与观测值耦合的常见做法反而意外加剧了这一问题。现有的解决方案通常通过改变目标函数、手动预设多样性水平或顺序更新来增加复杂性——这引发了一个根本性问题:共享策略能否在不引入这些复杂机制的情况下实现自适应?我们基于一个关键见解提出解决方案:一个以智能体为条件的超网络能够生成智能体特定的参数,并将观测条件梯度与智能体条件梯度解耦,从而直接抵消由智能体ID与观测值耦合引起的干扰。我们提出的方法HyperMARL避免了先前工作的复杂性,并通过实验降低了策略梯度方差。在多样化的MARL基准测试中(22个场景,最多30个智能体),HyperMARL在保持与非参数共享方法相当的行为多样性的同时,实现了与六个关键基线相竞争的性能,确立了其作为自适应MARL通用且原理性方法的地位。代码公开于https://github.com/KaleabTessera/HyperMARL。