Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, these systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation- and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents -- agents that are themselves unable to do so. The implementation code is available at: $\href{https://github.com/chandar-lab/R3D2-A-Generalist-Hanabi-Agent}{R3D2-A-Generalist-Hanabi-Agent}$
翻译:传统的多智能体强化学习(MARL)系统能够通过反复交互发展出合作策略。然而,这些系统仅能在其训练过的特定场景下表现良好,难以与不熟悉的协作方成功合作。这在Hanabi基准测试中尤为明显——该测试是一个流行的2至5人合作卡牌游戏,需要复杂的推理能力以及对其他智能体的精确辅助。当前针对Hanabi的MARL智能体仅能学习单一特定游戏场景(例如双人游戏),且只能与相同算法架构的智能体进行游戏。这与人类玩家形成鲜明对比:人类能够快速调整策略以适应陌生伙伴或新情境。本文提出循环回放相关分布式深度Q网络(R3D2),一种旨在突破这些局限的通用型Hanabi智能体。我们通过文本形式重构任务框架,因为已有研究表明语言能有效提升迁移能力。随后,我们提出一种分布式MARL算法,以应对由此产生的动态观测空间与动作空间。通过这种方法,我们的智能体成为首个能同时适应所有游戏场景,并将从某一场景习得的策略迁移至其他场景的智能体。因此,我们的智能体还展现出与不同算法智能体协作的能力——而这些算法智能体本身并不具备这种跨架构协作能力。实现代码发布于:$\href{https://github.com/chandar-lab/R3D2-A-Generalist-Hanabi-Agent}{R3D2-A-Generalist-Hanabi-Agent}$