Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies.
翻译:零样本人机协作有望实现无需人类数据即可与人类协同工作。现有方法通常通过自我对弈方式训练主体智能体与一个伙伴群体进行交互。然而,这些方法存在两个问题:1)有限伙伴构成的群体多样性受限,从而限制了训练后的主体智能体与陌生人类协作的能力;2)当前方法仅为群体中的每个伙伴提供通用最优响应,这可能导致与陌生伙伴或人类协作时零样本协调性能不佳。为解决上述问题,我们首先提出策略集成方法以增强群体中伙伴的多样性,进而开发一种上下文感知方法,使主体智能体能够分析并识别伙伴的潜在策略基元,从而据此采取差异化行动。通过这种方式,主体智能体能够学习更通用的协作行为以适配多样化的伙伴。我们在Overcooked环境中开展实验,并使用行为克隆的人类代理与真实人类评估了我们方法的零样本人机协作性能。结果表明,相较于基线方法,我们的方法显著提升了伙伴多样性,并使主体智能体习得更丰富的多样化行为,从而在所有场景中均取得了最先进的性能。我们还开源了一个基于Overcooked的人机协作研究框架,以便利未来相关研究。