Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents and effectively address open teams, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation from the perspective of cooperative game theory, and validate its learning paradigm in open team settings. Building on our theory, we propose a novel algorithm named CIAO compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demo of experiments is available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
翻译:自组织团队协作提出了一个具有挑战性的问题,它要求设计一个智能体,能够在没有预先协调或联合训练的情况下与队友协作。开放自组织团队协作通过考虑队友数量动态变化的环境(称为开放团队)进一步加剧了这一挑战。针对此问题,一种有前景的解决方案是利用图神经网络的泛化能力来处理无限制数量的智能体,从而有效应对开放团队,该方法被称为基于图的策略学习(GPL)。然而,其在协调图上对联合Q值的表示缺乏令人信服的解释。在本文中,我们建立了一个新的理论,从合作博弈论的视角来理解联合Q值的表示,并在开放团队设置中验证了其学习范式。基于我们的理论,我们提出了一种与GPL框架兼容的新算法,命名为CIAO,并附加了可证明的实现技巧以促进学习。实验演示可在 https://sites.google.com/view/ciao2024 查看,实验代码发布于 https://github.com/hsvgbkhgbv/CIAO。