Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation for OAHT, from the perspective of cooperative game theory, and validate its learning paradigm. Building on our theory, we propose a novel algorithm named CIAO, compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
翻译:自组织协作提出了一个具有挑战性的问题,它要求设计一个智能体能够在没有预先协调或联合训练的情况下与队友协作。开放自组织协作(OAHT)进一步增加了这一挑战的复杂性,它考虑了队友数量动态变化的环境,这种团队被称为开放团队。实践中针对该问题的一个有前景的解决方案是利用图神经网络的泛化能力来处理数量不受限制的智能体,这种方法被称为基于图的策略学习(GPL)。然而,其在协调图上对联合Q值的表示缺乏令人信服的解释。在本文中,我们从合作博弈论的视角出发,建立了一个新的理论来理解OAHT中的联合Q值表示,并验证了其学习范式。基于我们的理论,我们提出了一种与GPL框架兼容的新型算法CIAO,并附加了可证明的实现技巧以促进学习。实验结果的演示可在 https://sites.google.com/view/ciao2024 查看,实验代码发布于 https://github.com/hsvgbkhgbv/CIAO。