The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. To further optimize our GNN-PE approach, we also propose a more efficient GNN-based path group embedding (GNN-PGE) technique, which performs subgraph matching over grouped path embedding vectors. We design effective pruning strategies (w.r.t. grouped path embeddings) that can significantly reduce the search space during the index traversal. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE and GNN-PGE approaches for exact subgraph matching on both real and synthetic graph data.


翻译:精确子图匹配这一经典问题旨在从大规模数据图中找出与给定查询图同构的所有子图,其在众多现实应用中日益重要。本文提出了一种新颖且有效的基于图神经网络(GNN)的路径嵌入框架(GNN-PE),该框架能够实现高效的精确子图匹配且不产生漏报。与仅能提供近似子图匹配结果的传统GNN图嵌入方法不同,本文精心设计了基于GNN的路径嵌入方法,使得:若两条路径(及其上顶点的1跳邻域)存在子图关系,则它们对应的GNN嵌入向量将严格遵循支配关系。利用这种新设计的路径支配嵌入特性,我们能够提出基于路径标签/支配嵌入的有效剪枝策略,并保证子图匹配无漏报。我们在路径嵌入向量上构建多维索引,并通过并行遍历图分区上的索引并应用剪枝方法,开发了一种高效的子图匹配算法。同时,我们提出了一种基于成本模型的查询计划,从查询图中以较低查询代价获取查询路径。为进一步优化GNN-PE方法,我们还提出了一种更高效的基于GNN的路径组嵌入(GNN-PGE)技术,该技术基于分组路径嵌入向量执行子图匹配。我们设计了有效的剪枝策略(针对分组路径嵌入),能够在索引遍历过程中显著缩减搜索空间。通过大量实验,我们在真实与合成图数据上验证了所提GNN-PE与GNN-PGE方法在精确子图匹配中的高效性与有效性。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员