Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.

翻译：追逃博弈（PEGs）描述了在基于图的环境（如城市街道网络）中追捕者团队与逃逸者之间的交互行为。近期研究已证明，在PSRO中应用预训练和微调范式可有效提升大规模PEG问题的求解可扩展性。然而，这些方法主要针对具有固定初始条件的特定PEG问题，而现实场景中的初始条件可能存在显著差异，这极大限制了传统方法的适用性。为解决该问题，我们提出Grasper（通用型追逃问题追捕者），能高效生成适配特定PEG的追捕策略。本文贡献有三：首先，提出一种面向多样化PEG提供高质量解决方案的新型架构，其核心组件包括：（i）将PEG编码为隐向量的图神经网络（GNN），以及（ii）基于隐向量生成追捕策略的超网络。其次，开发高效的三阶段训练方法，包含：（i）预预训练阶段——通过图掩码自编码器（GraphMAE）等自监督图学习技术学习鲁棒PEG表征；（ii）预训练阶段——采用启发式引导多任务预训练（HMP），利用启发式参考策略（如通过Dijkstra算法获得）正则化追捕策略；（iii）微调阶段——使用PSRO在指定PEG上生成追捕策略。最后，我们在合成地图和真实地图上开展大量实验，结果显示Grasper在解质量和泛化能力方面显著优于基线方法。实验证明，Grasper为各类追逃问题提供了一种通用解决方案，能够实现面向真实场景的实用部署。