Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC.
翻译:随机部分可观测性为多智能体强化学习中的分散协调带来了重大挑战,但由于当前研究主要聚焦于基于状态的中心化训练分散执行(CTDE)范式,且常用基准测试(如星际争霸多智能体挑战SMAC)缺乏足够的随机性,该问题在很大程度上被忽视。本文提出基于注意力机制的循环嵌入多智能体学习方法(AERIAL),用于在随机部分可观测条件下近似价值函数。AERIAL以学习到的多智能体循环表示替代真实状态,相较于基于状态的CTDE,能够更精准地刻画分散智能体的决策信息。我们进一步构建了具有随机观测和更高初始状态方差的改进版SMAC基准(MessySMAC),以提供更通用且可配置的随机部分可观测性评估平台。我们在Dec-Tiger以及多种SMAC和MessySMAC地图上评估AERIAL性能,并与基于状态的CTDE进行对比。此外,我们还在MessySMAC的不同随机性配置下,测试了AERIAL与基于状态CTDE的鲁棒性。