Exact subgraph matching on large-scale graphs remains a challenging problem due to high computational complexity and distributed system constraints. Existing GNN-based path embedding (GNN-PE) frameworks achieve efficient exact matching on single machines but lack scalability and optimization for distributed environments. To address this gap, we propose three core innovations to extend GNN-PE to distributed systems: (1) a lightweight dynamic correlation-aware load balancing and hot migration mechanism that fuses multi-dimensional metrics (CPU, communication, memory) and guarantees index consistency; (2) an online incremental learning-based multi-GPU collaborative dynamic caching strategy with heterogeneous GPU adaptation and graph-structure-aware replacement; (3) a query plan ranking method driven by dominance embedding pruning potential (PE-score) that optimizes execution order. Through METIS partitioning, parallel offline preprocessing, and lightweight metadata management, our approach achieves "minimum edge cut + load balancing + non-interruptible queries" in distributed scenarios (tens of machines), significantly improving the efficiency and stability of distributed subgraph matching.
翻译:大规模图上的精确子图匹配由于高计算复杂度与分布式系统约束仍面临挑战。现有基于图神经网络路径嵌入(GNN-PE)框架能在单机上实现高效精确匹配,但缺乏对分布式环境的可扩展性与优化能力。为弥补这一不足,我们提出三项核心创新以将GNN-PE扩展至分布式系统:(1)融合多维度指标(CPU、通信、内存)的轻量化动态关联感知负载均衡与热迁移机制,并保证索引一致性;(2)基于在线增量学习的多GPU协作动态缓存策略,支持异构GPU自适应与图结构感知替换;(3)基于优势嵌入剪枝潜力(PE分数)驱动的查询计划排序方法,优化执行顺序。通过METIS划分、并行离线预处理与轻量级元数据管理,本方法在分布式场景(数十台机器)中实现了“最小割边+负载均衡+非中断查询”,显著提升了分布式子图匹配的效率与稳定性。