基于图神经网络的锚点嵌入用于高效精确子图匹配 (GNN-based Anchor Embedding for Efficient Exact Subgraph Matching)

Subgraph matching query is a fundamental problem in graph data management and has a variety of real-world applications. Several recent works utilize deep learning (DL) techniques to process subgraph matching queries. Most of them find approximate subgraph matching results without accuracy guarantees. Unlike these DL-based inexact subgraph matching methods, we propose a learning-based exact subgraph matching framework, called \textit{graph neural network (GNN)-based anchor embedding framework} (GNN-AE). In contrast to traditional exact subgraph matching methods that rely on creating auxiliary summary structures online for each specific query, our method indexes small feature subgraphs in the data graph offline and uses GNNs to perform graph isomorphism tests for these indexed feature subgraphs to efficiently obtain high-quality candidates. To make a tradeoff between query efficiency and index storage cost, we use two types of feature subgraphs, namely anchored subgraphs and anchored paths. Based on the proposed techniques, we transform the exact subgraph matching problem into a search problem in the embedding space. Furthermore, to efficiently retrieve all matches, we develop a parallel matching growth algorithm and design a cost-based DFS query planning method to further improve the matching growth algorithm. Extensive experiments on 6 real-world and 3 synthetic datasets indicate that GNN-AE is more efficient than the baselines, especially outperforming the exploration-based baseline methods by up to 1--2 orders of magnitude.

翻译：子图匹配查询是图数据管理中的一个基本问题，具有多种实际应用。近年来的一些研究利用深度学习技术处理子图匹配查询，其中大多数方法仅能获得近似匹配结果且无法保证准确性。与这些基于深度学习的非精确子图匹配方法不同，我们提出了一种基于学习的精确子图匹配框架，称为基于图神经网络的锚点嵌入框架。与传统精确子图匹配方法需在线为每个特定查询创建辅助摘要结构不同，我们的方法在离线状态下对数据图中的小型特征子图建立索引，并利用图神经网络对这些索引特征子图进行图同构测试，从而高效获取高质量候选匹配。为了在查询效率与索引存储成本之间取得平衡，我们采用两种类型的特征子图：锚定子图和锚定路径。基于所提出的技术，我们将精确子图匹配问题转化为嵌入空间中的搜索问题。此外，为高效检索所有匹配结果，我们开发了一种并行匹配增长算法，并设计了基于代价的深度优先搜索查询规划方法以进一步优化匹配增长算法。在6个真实数据集和3个合成数据集上的大量实验表明，GNN-AE比基线方法更高效，尤其在性能上超越基于探索的基线方法达1-2个数量级。