Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
翻译:通过多智能体强化学习训练团队完成复杂任务可能面临诸多挑战,例如在大型联合策略空间中进行策略搜索,以及因智能体相互适应导致的非平稳性问题。为促进复杂多智能体任务的高效学习,我们提出一种基于专家提供的任务分解方法,将复杂任务拆解为更简单的多智能体子任务。在每个子任务中,训练团队的部分成员以获取子任务专属策略;随后合并子团队并将其迁移至目标任务,通过联合微调策略以解决更复杂的目标任务。实验表明,与从头训练相比,这类方法可大幅减少求解复杂目标任务所需的步数。然而,我们也识别并研究了基于子任务分解的朴素实现存在的两个问题,并提出一种简单且可扩展的方法,通过增强现有actor-critic算法来解决这些问题。我们通过实验证明了所提方法的有效性,使得子任务分解方法能够应用于多样化的多智能体任务。