Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.
翻译:追逃博弈(PEGs)描述了在基于图的环境(如城市街道网络)中追捕者团队与逃逸者之间的交互行为。近期研究已证明,在PSRO中应用预训练和微调范式可有效提升大规模PEG问题的求解可扩展性。然而,这些方法主要针对具有固定初始条件的特定PEG问题,而现实场景中的初始条件可能存在显著差异,这极大限制了传统方法的适用性。为解决该问题,我们提出Grasper(通用型追逃问题追捕者),能高效生成适配特定PEG的追捕策略。本文贡献有三:首先,提出一种面向多样化PEG提供高质量解决方案的新型架构,其核心组件包括:(i)将PEG编码为隐向量的图神经网络(GNN),以及(ii)基于隐向量生成追捕策略的超网络。其次,开发高效的三阶段训练方法,包含:(i)预预训练阶段——通过图掩码自编码器(GraphMAE)等自监督图学习技术学习鲁棒PEG表征;(ii)预训练阶段——采用启发式引导多任务预训练(HMP),利用启发式参考策略(如通过Dijkstra算法获得)正则化追捕策略;(iii)微调阶段——使用PSRO在指定PEG上生成追捕策略。最后,我们在合成地图和真实地图上开展大量实验,结果显示Grasper在解质量和泛化能力方面显著优于基线方法。实验证明,Grasper为各类追逃问题提供了一种通用解决方案,能够实现面向真实场景的实用部署。