Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies.
翻译:零样本人机协作技术有望在无需人类数据的情况下实现与人类的协作。现有方法主要通过自我博弈训练智能体与一组合作伙伴进行互动。然而,这些方法存在两个问题:1)有限合作伙伴种群导致多样性受限,从而制约了智能体与陌生人类协作的能力;2)现有方法仅能为种群中每个合作伙伴提供通用最优响应策略,导致智能体在面对新伙伴或人类时零样本协作表现不佳。为解决上述问题,我们首先提出策略集成方法以增强种群中合作伙伴的多样性,进而开发上下文感知方法,使智能体能够分析并识别合作伙伴的潜在策略基元,从而采取差异化行动。通过这种方式,智能体能够学习与多样化合作伙伴协作的通用协同行为。我们在Overcooked环境中开展实验,使用行为克隆人类代理和真实人类评估方法的零样本人机协作性能。结果表明,与基线方法相比,我们的方法显著提升了合作伙伴多样性,并使智能体学习到更多元化的协作行为,在所有场景中均达到最先进性能。为方便后续研究,我们还开源了基于Overcooked的人机协作研究框架。