A major challenge for deep reinforcement learning (DRL) agents is to collaborate with novel partners that were not encountered by them during the training phase. This is specifically worsened by an increased variance in action responses when the DRL agents collaborate with human partners due to the lack of consistency in human behaviors. Recent work have shown that training a single agent as the best response to a diverse population of training partners significantly increases an agent's robustness to novel partners. We further enhance the population-based training approach by introducing a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent is able to learn multiple best-response policies as its low-level policy while at the same time, it learns a high-level policy that acts as a manager which allows the agent to dynamically switch between the low-level best-response policies based on its current partner. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment. We also conducted a human study in the same environment to test the effectiveness of our method when partnering with real human subjects.
翻译:深度强化学习(DRL)智能体面临的一个主要挑战是与训练阶段未曾遇到过的新伙伴进行协作。当DRL智能体与人类伙伴协作时,由于人类行为缺乏一致性,动作响应的方差增大,这一问题尤其严重。近期研究表明,将单个智能体训练为对多样化训练伙伴群体的最佳响应,能显著提升其对陌生伙伴的鲁棒性。我们通过引入基于层级强化学习(HRL)的人机协作方法,进一步增强了基于群体的训练方法。我们的智能体能够学习多个最佳响应策略作为其底层策略,同时学习一个作为管理者的高层策略,该策略允许智能体根据当前伙伴动态切换底层最佳响应策略。我们证明,在双人协作型Overcooked游戏环境中,该方法能够动态适应不同玩法风格和技能水平的新伙伴。我们还在相同环境中进行了人类研究,以测试该方法在与真实人类被试协作时的有效性。