Hypergraphs are generalisation of graphs in which a hyperedge can connect any number of vertices. It can describe n-ary relationships and high-order information among entities compared to conventional graphs. In this paper, we study the fundamental problem of subhypergraph matching in hypergraphs. Existing methods directly extend subgraph matching algorithms to the case of hypergraphs. However, this approach delays hyperedge verification and underutilises the high-order information in hypergraphs, which leads to large search space and high enumeration cost. Furthermore, with the growing size of hypergraphs, it is becoming hard to compute subhypergraph matching sequentially. Thus, we propose an efficient and parallel subhypergraph matching system, HGMatch, to handle subhypergraph matching in massive hypergraphs. We proposes a novel match-by-hyperedge framework to utilise high-order information in hypergraphs and uses set operations for efficient candidates generation. Moreover, we develop an optimised parallel execution engine in HGMatch based on the dataflow model, which features a task-based scheduler and fine-grained dynamic work stealing to achieve bounded memory execution and better load balancing. Experimental evaluation on 10 real-world datasets shows that HGMatch outperforms the extended version of the state-of-the-art subgraph matching algorithms (CFL, DAF, CECI and RapidMatch) by orders of magnitude when using a single thread, and achieves almost linear scalability when the number of threads increases.
翻译:超图是图的一种泛化形式,其中一条超边可连接任意数量的顶点。与传统图相比,超图能够描述实体间的n元关系和高阶信息。本文研究超图中子超图匹配这一基础问题。现有方法直接将子图匹配算法扩展到超图场景,但这种做法延迟了超边验证,且未能充分利用超图中的高阶信息,导致搜索空间大、枚举成本高。此外,随着超图规模的增长,串行计算子超图匹配变得愈发困难。为此,我们提出高效并行的子超图匹配系统HGMatch,用于处理大规模超图中的子超图匹配问题。我们提出一种新颖的按超边匹配框架,通过利用超图中的高阶信息,并采用集合运算实现高效候选生成。同时,基于数据流模型开发了HGMatch的优化并行执行引擎,该引擎采用基于任务的调度器与细粒度动态任务窃取机制,以实现有界内存执行与更好的负载均衡。在10个真实数据集上的实验评估表明:单线程场景下,HGMatch比当前最先进的子图匹配算法(CFL、DAF、CECI和RapidMatch)的扩展版本高出数个数量级;随着线程数增加,几乎可实现线性可扩展性。