Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC.
翻译:随机部分可观测性对多智能体强化学习中的分散协调构成重大挑战,但由于当前研究过度聚焦于基于状态的集中训练分散执行(CTDE)范式,以及像星际争霸多智能体挑战赛(SMAC)这类缺乏足够随机性的基准测试,该问题在很大程度上被忽视。本文提出多智能体学习中的注意力循环嵌入方法(AERIAL),用于在随机部分可观测条件下逼近价值函数。AERIAL利用多智能体循环的习得表征替代真实状态,相比基于状态的CTDE方法能更精确地刻画分散化智能体的决策信息。我们进一步提出MessySMAC——通过引入随机观测和更高初始状态方差对SMAC进行改进的变体,以构建更通用且可配置的随机部分可观测性基准测试。我们在Dec-Tiger环境及多种SMAC/MessySMAC地图中评估AERIAL,并将其结果与基于状态的CTDE方法进行对比。此外,我们还在MessySMAC中针对多种随机性配置,深入分析了AERIAL与基于状态CTDE方法的鲁棒性差异。