While there has been significant progress in curriculum learning and continuous learning for training agents to generalize across a wide variety of environments in the context of single-agent reinforcement learning, it is unclear if these algorithms would still be valid in a multi-agent setting. In a competitive setting, a learning agent can be trained by making it compete with a curriculum of increasingly skilled opponents. However, a general intelligent agent should also be able to learn to act around other agents and cooperate with them to achieve common goals. When cooperating with other agents, the learning agent must (a) learn how to perform the task (or subtask), and (b) increase the overall team reward. In this paper, we aim to answer the question of what kind of cooperative teammate, and a curriculum of teammates should a learning agent be trained with to achieve these two objectives. Our results on the game Overcooked show that a pre-trained teammate who is less skilled is the best teammate for overall team reward but the worst for the learning of the agent. Moreover, somewhat surprisingly, a curriculum of teammates with decreasing skill levels performs better than other types of curricula.
翻译:尽管在单智能体强化学习的背景下,课程学习和连续学习在训练智能体泛化于多种环境方面取得了显著进展,但这些算法在多智能体场景中是否仍然有效尚不明确。在竞争环境中,学习智能体可以通过与一系列技能逐渐增强的对手竞争来训练。然而,通用智能体还应能够与其他智能体协同行动并合作实现共同目标。在与其他智能体合作时,学习智能体必须(a)学习如何完成任务(或子任务),以及(b)提升整体团队奖励。本文旨在探讨以下问题:为实现这两个目标,学习智能体应接受何种类型的合作队友以及何种队友课程训练。我们在游戏《Overcooked》上的结果表明,技能较低但经过预训练的队友在整体团队奖励方面表现最佳,但对智能体的学习最为不利。此外,出乎意料的是,随着技能水平递减的队友课程相比其他类型的课程表现更优。