Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
翻译:即时团队协作提出了一个具有挑战性的问题,它要求设计一个智能体,能够在没有预先协调或联合训练的情况下与队友协作。开放式即时团队协作(OAHT)进一步增加了这一挑战的复杂性,它考虑的是队友数量动态变化的环境,这种团队被称为开放团队。实践中针对该问题的一个有前景的解决方案,是利用图神经网络(GNN)的泛化能力来处理数量不限、类型多样的智能体,这种方法被称为基于图的策略学习(GPL)。然而,其在协调图上对联合Q值的表示缺乏令人信服的解释。在本文中,我们通过合作博弈论的视角,建立了一个新的理论来理解OAHT中联合Q值的表示及其学习范式。基于我们的理论,我们在GPL框架的基础上提出了一种名为CIAO的新算法,并辅以可证明的额外实现技巧以促进学习。实验结果的演示可在 https://sites.google.com/view/ciao2024 查看,实验代码发布于 https://github.com/hsvgbkhgbv/CIAO。