We introduce the Overcooked Generalisation Challenge (OGC) - the first benchmark to study agents' zero-shot cooperation abilities when faced with novel partners and levels in the Overcooked-AI environment. This perspective starkly contrasts a large body of previous work that has trained and evaluated cooperating agents only on the same level, failing to capture generalisation abilities required for real-world human-AI cooperation. Our challenge interfaces with state-of-the-art dual curriculum design (DCD) methods to generate auto-curricula for training general agents in Overcooked. It is the first cooperative multi-agent environment specially designed for DCD methods and, consequently, the first benchmarked with state-of-the-art methods. It is fully GPU-accelerated, built on the DCD benchmark suite minimax, and freely available under an open-source license: https://git.hcics.simtech.uni-stuttgart.de/public-projects/OGC. We show that current DCD algorithms struggle to produce useful policies in this novel challenge, even if combined with recent network architectures that were designed for scalability and generalisability. The OGC pushes the boundaries of real-world human-AI cooperation by enabling the research community to study the impact of generalisation on cooperating agents.
翻译:我们提出《Overcooked通用化挑战》(OGC)——首个研究智能体在Overcooked-AI环境中面对新合作伙伴与新关卡时零样本协作能力的基准测试。这一视角与先前大量研究形成鲜明对比,那些研究仅在相同关卡训练和评估协作智能体,未能捕捉现实世界人机协作所需的泛化能力。本挑战通过与最先进的双课程设计(DCD)方法对接,为训练Overcooked通用智能体生成自动课程。这是首个专为DCD方法设计的协作多智能体环境,也因此成为首个采用最先进方法进行基准测试的平台。该环境完全基于GPU加速,构建于DCD基准套件minimax之上,并以开源许可证免费提供:https://git.hcics.simtech.uni-stuttgart.de/public-projects/OGC。研究表明,当前DCD算法在这一新型挑战中难以产生有效策略,即使结合了专为可扩展性和泛化性设计的最新网络架构。OGC通过让研究社区能够探究泛化能力对协作智能体的影响,推动了现实世界人机协作的边界拓展。