Training agents in multi-agent competitive games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by opponents' strategies. Existing methods often struggle with slow convergence and instability. To address this, we harness the potential of imitation learning to comprehend and anticipate opponents' behavior, aiming to mitigate uncertainties with respect to the game dynamics. Our key contributions include: (i) a new multi-agent imitation learning model for predicting next moves of the opponents -- our model works with hidden opponents' actions and local observations; (ii) a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process; and (iii) extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.
翻译:在多智能体竞争博弈中训练智能体因其复杂特性而面临重大挑战。这些挑战因受环境及对手策略共同影响的动态特性而加剧。现有方法常陷入收敛缓慢与不稳定的困境。为解决此问题,我们利用模仿学习的潜力来理解并预判对手行为,旨在降低博弈动态中的不确定性。本文核心贡献包括:(i) 提出一种新型多智能体模仿学习模型,用于预测对手下一步动作——该模型可在对手动作隐藏且仅基于局部观测的情况下运行;(ii) 提出一种新型多智能体强化学习算法,将模仿学习模型与策略训练整合至单一训练流程;(iii) 在三个具有挑战性的游戏环境中开展大量实验,包括星际争霸多智能体挑战的进阶版本(即SMACv2)。实验结果表明,与现有最先进的多智能体强化学习算法相比,我们的方法取得了更优性能。