Subgraph matching is a fundamental problem in graph analysis with a wide range of applications. However, due to its inherent NP-hardness, enumerating subgraph matches efficiently on large real-world graphs remains highly challenging. Most existing works adopt a depth-first search (DFS) backtracking strategy, where a partial embedding is gradually extended in a DFS manner along a branch of the search trees until either a full embedding is found or no further extension is possible. A major limitation of this paradigm is the significant amount of duplicate computation that occurs during enumeration, which increases the overall runtime. To overcome this limitation, we propose a novel subgraph matching algorithm, CEMR. It incorporates two techniques to reduce duplicate extensions: common extension merging, which leverages a black-white vertex encoding, and common extension reusing, which employs common extension buffers. In addition, we design two pruning techniques to discard unpromising search branches. Extensive experiments on real-world datasets and diverse query workloads demonstrate that CEMR outperforms state-of-the-art subgraph matching methods.
翻译:子图匹配是图分析中的一个基本问题,具有广泛的应用。然而,由于其固有的NP难特性,在大型现实世界图上高效枚举子图匹配仍然极具挑战性。大多数现有工作采用深度优先搜索(DFS)回溯策略,即部分嵌入沿着搜索树的分支以DFS方式逐步扩展,直到找到完整嵌入或无法进一步扩展为止。这种范式的一个主要局限是在枚举过程中会产生大量重复计算,从而增加了整体运行时间。为了克服这一局限,我们提出了一种新颖的子图匹配算法CEMR。它结合了两种技术来减少重复扩展:利用黑白顶点编码的公共扩展合并,以及采用公共扩展缓冲区的公共扩展重用。此外,我们设计了两种剪枝技术来丢弃无望的搜索分支。在真实世界数据集和多样化查询工作负载上的大量实验表明,CEMR的性能优于最先进的子图匹配方法。