Graph matching is a fundamental problem in pattern recognition, with many applications such as software analysis and computational biology. One well-known type of graph matching problem is graph isomorphism, which consists of deciding if two graphs are identical. Despite its usefulness, the properties that one may check using graph isomorphism are rather limited, since it only allows strict equality checks between two graphs. For example, it does not allow one to check complex structural properties such as if the target graph is an arbitrary length sequence followed by an arbitrary size loop. We propose a generalization of graph isomorphism that allows one to check such properties through a declarative specification. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph, inspired by regular expressions, that may contain wildcard nodes that represent arbitrary structures such as variable-sized sequences or subgraphs. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP. We also propose a preprocessing technique for improving the performance of the algorithm and evaluate it through an extensive experimental evaluation on benchmarks from the CodeSearchNet dataset.
翻译:图匹配是模式识别中的一个基本问题,在软件分析和计算生物学等领域具有广泛应用。图同构是图匹配问题中广为人知的一种类型,其目标在于判定两个图是否完全相同。尽管图同构具有实用性,但由于它仅允许对两个图进行严格相等性检查,因此可验证的属性相当有限。例如,它无法检查诸如"目标图是任意长度序列后接任意大小循环"这类复杂结构属性。我们提出了一种图同构的泛化方法,允许通过声明式规范来检查此类属性。该规范以正则图模式(ReGaP)的形式给出——这是一种受正则表达式启发的特殊图类型,可包含代表任意结构(如可变长度序列或子图)的通配符节点。我们提出了一种基于SAT的算法,用于判定目标图是否匹配给定的ReGaP。此外,我们还提出了一种预处理技术以提升算法性能,并通过CodeSearchNet数据集上的基准测试进行了广泛的实验评估。