Over these years, multi-agent reinforcement learning has achieved remarkable performance in multi-agent planning and scheduling tasks. It typically follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of zero-shot coordination, where an agent must coordinate with unseen partners, self-play agents may fail. Several methods have been proposed to handle this problem, but they either take a lot of time or lack generalizability. In this paper, we firstly reveal an important phenomenon: the zero-shot coordination performance is strongly linearly correlated with the similarity between an agent's training partner and testing partner. Inspired by it, we put forward a Similarity-Based Robust Training (SBRT) scheme that improves agents' zero-shot coordination performance by disturbing their partners' actions during training according to a pre-defined policy similarity value. To validate its effectiveness, we apply our scheme to three multi-agent reinforcement learning frameworks and achieve better performance compared with previous methods.
翻译:近年来,多智能体强化学习在多智能体规划与调度任务中取得了显著成效。该方法通常采用自对弈训练模式,即智能体通过与固定群体进行交互训练。然而在零样本协作场景中(智能体需与未见过的伙伴协调配合),自对弈智能体可能面临失败。现有解决方案虽已提出多种方法,但普遍存在训练耗时较长或泛化能力不足的问题。本文首次揭示重要现象:零样本协作性能与智能体训练伙伴和测试伙伴之间的策略相似性呈显著线性相关。受此启发,我们提出基于策略相似性的鲁棒训练(SBRT)方案,通过根据预设策略相似度扰动训练伙伴动作来提升智能体的零样本协作性能。为验证其有效性,我们将该方案应用于三个多智能体强化学习框架,实验结果表明其性能优于现有方法。