State uncertainty poses a major challenge for decentralized coordination but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under agent-wise state uncertainty. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding state uncertainty. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various state uncertainty configurations in MessySMAC.
翻译:状态不确定性给分散协调带来了重大挑战,但在当前前沿研究中,由于过度依赖基于状态的中心化训练-分散执行(CTDE)范式以及缺乏足够随机性的基准测试(如星际争霸多智能体挑战 SMAC),这一问题在很大程度上被忽视。本文提出基于注意力的多智能体循环嵌入(AERIAL),用于在智能体级状态不确定性下近似价值函数。AERIAL 用多智能体循环的习得表征替代真实状态,相较于基于状态的CTDE,能更精准地反映分散智能体的决策信息。我们进一步引入 MessySMAC——SMAC 的改进版本,其观测具有随机性且初始状态方差更高,从而提供更通用、可配置的针对状态不确定性的基准测试。在 Dec-Tiger 以及多种 SMAC 和 MessySMAC 地图中评估了 AERIAL,并与基于状态的 CTDE 进行对比。此外,在 MessySMAC 中测试了 AERIAL 和基于状态 CTDE 对不同状态不确定性配置的鲁棒性。