Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-\`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
翻译:多智能体强化学习在解决各类现实应用中的复杂协作任务方面展现出巨大潜力。然而,现有的MARL方法通常依赖于一个限制性假设,即实体数量(如智能体、障碍物)在训练与推理阶段保持不变。这忽略了在推理轨迹中实体被动态移除或添加的场景——此类情况在现实环境(如搜救任务和动态对抗情境)中普遍存在。本文致力于解决零样本域外泛化条件下轨迹内动态实体组合的挑战,此类动态变化无法事先预知。我们的实证研究表明,现有MARL方法在此类场景中均出现显著的性能下降与不确定性增加。为此,我们提出FlickerFusion——一种新颖的域外泛化方法,可作为适用于各类MARL骨干模型的通用增强技术。FlickerFusion通过随机丢弃部分观测空间,在域外推理时模拟域内状态。实验结果表明,相较于现有方法,FlickerFusion不仅能获得更优的推理奖励,还能独特地降低相对于骨干模型的不确定性。相关基准测试、实现代码与模型权重已在flickerfusion305.github.io整理并开源,同时附有丰富的演示视频渲染。