Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. The state-of-the-art solution to this problem is graph-based policy learning (GPL), leveraging the generalizability of graph neural networks to handle an unrestricted number of agents and effectively address open teams. GPL's performance is superior to other methods, but its joint Q-value representation presents challenges for interpretation, hindering further development of this research line and applicability. In this paper, we establish a new theory to give an interpretation for the joint Q-value representation employed in GPL, from the perspective of cooperative game theory. Building on our theory, we propose a novel algorithm based on GPL framework, to complement the critical features that facilitate learning, but overlooked in GPL. Through experiments, we demonstrate the correctness of our theory by comparing the performance of the resulting algorithm with GPL in dynamic team compositions.
翻译:临时团队协作提出了一个具有挑战性的问题,需要设计一个能够与队友协作的智能体,而无需事先协调或联合训练。开源临时团队协作进一步复杂化了这一挑战,它考虑了队友数量会发生变化的环境,即所谓的开放团队。解决此问题的最先进方法是基于图的策略学习(GPL),该方法利用图神经网络的泛化能力来处理不受限制的智能体数量,并有效应对开放团队。GPL的性能优于其他方法,但其联合Q值表示在解释方面存在挑战,阻碍了该研究方向及其应用性的进一步发展。在本文中,我们建立了一个新理论,从合作博弈论的角度对GPL中使用的联合Q值表示进行了解释。基于我们的理论,我们提出了一种基于GPL框架的新算法,以补充GPL中忽略但有利于学习的关键特征。通过实验,我们通过比较所得算法与GPL在动态团队组成中的性能,证明了我们理论的正确性。