Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided curriculum of simpler multi-agent sub-tasks. In each sub-task of the curriculum, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fined tuned to solve the more complex target task. We present MEDoE, a flexible method which identifies situations in the target task where each agent can use its sub-task-specific skills, and uses this information to modulate hyperparameters for learning and exploration during the fine-tuning process. We compare MEDoE to multi-agent reinforcement learning baselines that train from scratch in the full task, and with na\"ive applications of standard multi-agent reinforcement learning techniques for fine-tuning. We show that MEDoE outperforms baselines which train from scratch or use na\"ive fine-tuning approaches, requiring significantly fewer total training timesteps to solve a range of complex teamwork tasks.
翻译:通过多智能体强化学习训练团队完成复杂任务面临诸多挑战,例如在庞大策略空间中进行策略搜索,以及因智能体相互适应导致的非平稳性问题。为促进复杂多智能体任务的高效学习,我们提出一种方法,采用专家提供的更简单多智能体子任务课程。在该课程的各子任务中,训练整个团队的子集以获取特定于子任务的策略。随后合并子团队并迁移至目标任务,通过集体微调其策略以解决更复杂的目标任务。我们提出MEDoE这一灵活方法,能够识别目标任务中每个智能体可运用其子任务特定技能的场景,并利用该信息在微调过程中调控学习和探索的超参数。我们将MEDoE与从零开始训练完整任务的多智能体强化学习基线,以及采用标准多智能体强化学习技术进行朴素微调的方法进行对比。实验表明,MEDoE在解决一系列复杂团队协作任务时,所需总训练时间步数显著少于从零开始训练或使用朴素微调方法的基线,展现出更优性能。