Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
翻译:学习与先前未见过的伙伴协作是多智能体学习中的一个基本泛化挑战,即临时团队协作(Ad Hoc Teamwork,AHT)。现有的AHT方法通常采用两阶段流程:首先生成一个固定的队友群体,其理念是这些队友应能代表部署时可能遇到的队友;其次,训练一个AHT智能体,使其能与该群体中的智能体良好协作。迄今为止,研究界一直专注于为每个阶段设计独立的算法。这种分离导致生成的队友行为覆盖范围有限,且忽略了生成的队友是否易于AHT智能体学习。此外,训练AHT智能体的算法通常将训练队友集合视为静态的,因此在尝试泛化到先前未见过的伙伴智能体时,并未假设对训练队友集合有任何控制权。本文通过将AHT问题重新表述为AHT智能体与对抗性队友生成器之间的开放式学习过程,提出了一个统一的AHT框架。我们引入了ROTATE,一种遗憾驱动的开放式训练算法,该算法交替改进AHT智能体与生成能探测其缺陷的队友。在多种双玩家环境中的实验表明,ROTATE在泛化到未见过的评估队友集合方面显著优于基线方法,从而为鲁棒且可泛化的团队协作设立了新标准。