Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies.
翻译:零样本人机协调有望在没有人类数据的情况下实现与人类的协作。现有方法通常通过自博弈训练智能体与一组伙伴进行协作。然而,这些方法存在两个问题:1)有限数量伙伴的群体多样性受限,从而限制了训练后智能体与陌生人类协作的能力;2)当前方法仅为群体中的每个伙伴提供通用的最优响应,这可能导致与陌生伙伴或人类进行零样本协调时表现不佳。为解决这些问题,我们首先提出策略集成方法以增加群体中伙伴的多样性,随后开发了一种上下文感知方法,使智能体能够分析并识别伙伴的潜在策略基元,从而据此采取不同行动。通过这种方式,智能体能够学习更通用的协作行为以与多样化的伙伴协作。我们在Overcooked环境中开展实验,并通过行为克隆人类代理与真实人类评估了方法的零样本人机协调性能。结果表明,我们的方法显著提升了伙伴多样性,使智能体相较于基线方法能学习到更多样化的行为,从而在所有场景中实现了最先进的性能。我们还开源了一个基于Overcooked的人机协调研究框架,以方便未来研究。