Hypergraphs are generalisation of graphs in which a hyperedge can connect any number of vertices. It can describe n-ary relationships and high-order information among entities compared to conventional graphs. In this paper, we study the fundamental problem of subgraph matching on hypergraphs (i.e, subhypergraph matching). Existing methods directly extend subgraph matching algorithms to the case of hypergraphs. However, this approach delays hyperedge verification and underutilises the high-order information in hypergraphs, which leads to large search space and high enumeration cost. Furthermore, with the growing size of hypergraphs, it is becoming hard to compute subhypergraph matching sequentially. Thus, we propose an efficient and parallel subhypergraph matching system, HGMatch, to handle subhypergraph matching in massive hypergraphs. We proposes a novel match-by-hyperedge framework to utilise high-order information in hypergraphs and uses set operations for efficient candidates generation. Moreover, we develop an optimised parallel execution engine in HGMatch based on the dataflow model, which features a task-based scheduler and fine-grained dynamic work stealing to achieve bounded memory execution and better load balancing. Experimental evaluation on 10 real-world datasets shows that HGMatch outperforms the extended version of the state-of-the-art subgraph matching algorithms (CFL, DAF, CECI and RapidMatch) by orders of magnitude when using a single thread, and achieves almost linear scalability when the number of threads increases.
翻译:摘要:超图是图的一种泛化形式,其中每条超边可连接任意数量的顶点。相较于传统图,超图能够描述实体间的n元关系和高阶信息。本文研究超图上的子图匹配(即子超图匹配)这一基础问题。现有方法直接将子图匹配算法扩展到超图场景,但这种方式会延迟超边验证过程,且未能充分利用超图中的高阶信息,导致搜索空间过大和枚举开销过高。此外,随着超图规模的增长,串行计算子超图匹配愈发困难。为此,我们提出高效并行的子超图匹配系统HGMatch,用于处理大规模超图中的子超图匹配任务。该系统提出一种新颖的“逐超边匹配”框架,利用超图中的高阶信息,并通过集合运算实现高效的候选生成。同时,基于数据流模型,我们在HGMatch中开发了优化的并行执行引擎,该引擎采用基于任务的调度器和细粒度动态工作窃取机制,以实现有界内存执行和更好的负载均衡。在10个真实数据集上的实验评估表明,在单线程场景下,HGMatch的性能比最先进的子图匹配算法(CFL、DAF、CECI和RapidMatch)的扩展版本高出数个数量级,且随着线程数增加,系统展现出近乎线性的可扩展性。