Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.
翻译:参数共享是多智能体强化学习(MARL)中提升可扩展性的关键策略,但传统的全共享架构常导致智能体行为趋同。现有方法通过聚类、剪枝或掩码机制引入多样性,但往往以牺牲资源效率为代价。本文提出棱镜(Prism)框架,该框架通过奇异值分解(SVD)在谱域表征共享网络,从而诱导智能体间的多样性。所有智能体共享奇异向量方向,同时学习针对奇异值的差异化谱掩码。该机制在促进智能体多样性的同时保持了可扩展性。在同类任务(LBF、SMACv2)与异质任务(MaMuJoCo)基准上的大量实验表明,棱镜框架能以优越的资源效率实现具有竞争力的性能。