Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
翻译:协作式多智能体强化学习(MARL)近五年来因在现实应用中的巨大潜力而成为日益重要的研究课题。受维度灾难影响,主流的“集中训练分散执行”框架虽经长时间训练仍难以高效收敛。本文提出一种通用训练框架MARL-LNS,通过将现有深度MARL算法作为低层训练器,交替训练智能体子集来从算法层面解决上述问题,且无需引入额外可训练参数。基于该框架,我们提出三种算法变体:随机大邻域搜索(RLNS)、批量大邻域搜索(BLNS)和自适应大邻域搜索(ALNS),它们以不同方式交替选择智能体子集。在星际争霸多智能体挑战赛和谷歌研究足球上的测试表明,我们的算法可在达到原始算法最终技能水平的前提下,自动减少至少10%的训练时间。