In cooperative multi-agent reinforcement learning (MARL), the environmental stochasticity and uncertainties will increase exponentially when the number of agents increases, which puts hard pressure on how to come up with a compact latent representation from partial observation for boosting value decomposition. To tackle these issues, we propose a simple yet powerful method that alleviates partial observability and efficiently promotes coordination by introducing the UNit-wise attentive State Representation (UNSR). In UNSR, each agent learns a compact and disentangled unit-wise state representation outputted from transformer blocks, and produces its local action-value function. The proposed UNSR is used to boost the value decomposition with a multi-head attention mechanism for producing efficient credit assignment in the mixing network, providing an efficient reasoning path between the individual value function and joint value function. Experimental results demonstrate that our method achieves superior performance and data efficiency compared to solid baselines on the StarCraft II micromanagement challenge. Additional ablation experiments also help identify the key factors contributing to the performance of UNSR.
翻译:在合作多智能体强化学习(MARL)中,环境随机性和不确定性会随智能体数量增加而成倍增长,这给如何从局部观测中提取紧凑的潜在表示以促进价值分解带来了巨大压力。为解决这些问题,我们提出了一种简单而强大的方法——单元级注意力状态表示(UNSR),通过缓解部分可观测性并高效促进协同。在UNSR中,每个智能体从Transformer模块中学习一个紧凑且解耦的单元级状态表示,并生成其局部动作价值函数。所提出的UNSR通过多头注意力机制增强价值分解,在混合网络中实现高效的信用分配,从而在个体价值函数与联合价值函数之间建立高效的推理路径。实验结果表明,在星际争霸II微观管理挑战中,我们的方法相比强基线取得了卓越的性能和数据效率。额外的消融实验也有助于识别影响UNSR性能的关键因素。