Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC.
翻译:随机部分可观测性是多智能体强化学习中分散协调的主要挑战,但由于当前研究过度关注基于状态的中心化训练-分散化执行(CTDE)范式,且缺乏如星际争霸多智能体挑战赛(SMAC)这类具有足够随机性的基准测试,这一问题在很大程度上被忽视。本文提出基于注意力的多智能体循环嵌入方法(AERIAL),用于在随机部分可观测条件下近似价值函数。与传统基于状态的CTDE相比,AERIAL采用多智能体循环机制的习得表示替代真实状态,能够更准确地反映分散智能体决策的完整信息。我们进一步提出MessySMAC——一个引入随机观测与更高初始状态方差的SMAC改进版本,作为针对随机部分可观测性的更通用且可配置的基准测试。在Dec-Tiger环境及多种SMAC/MessySMAC地图上的实验表明,AERIAL性能优于基于状态的CTDE。此外,我们还在MessySMAC的多种随机性配置下评估了AERIAL与基于状态CTDE的鲁棒性。