Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent option discovery either become prohibitive or fail to directly discover joint options that improve the connectivity of the joint state space. In this paper, we show how to directly compute multi-agent options with collaborative exploratory behaviors while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.
翻译:覆盖技能(也称为选项)发现已被开发用于改善单智能体强化学习在稀疏奖励信号下的探索,其通过连接状态转移图的Fiedler向量所提供嵌入空间中最远的状态来实现。由于多智能体系统中联合状态空间随智能体数量呈指数级增长,现有研究仍依赖单智能体选项发现,要么变得难以实现,要么无法直接发现能改善联合状态空间连通性的联合选项。在本文中,我们展示了如何直接计算具有协作探索行为的多智能体选项,同时保持分解的简易性。我们的核心思想是将联合状态空间近似为克罗内克图,基于此,我们可以利用各智能体转移图的拉普拉斯谱直接估计其Fiedler向量。此外,考虑到直接计算拉普拉斯谱对于具有无限规模状态空间的任务而言是难以处理的,我们进一步提出了一种基于神经网络表示学习技术的特征函数估计深度学习方法扩展。在基于Mujoco等模拟器构建的多智能体任务上的评估表明,所提出的算法能够成功识别多智能体选项,并显著优于现有技术水平。代码可在 https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP 获取。