Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research. However, existing approaches for dynamic subgraph matching often proceed serially, retrieving incremental matches for each updated edge individually. This approach falls short when handling batch data updates, leading to a decrease in system throughput. Leveraging the parallel processing power of GPUs, which can execute a massive number of cores simultaneously, has been widely recognized for performance acceleration in various domains. Surprisingly, systematic exploration of subgraph matching in the context of batch-dynamic graphs, particularly on a GPU platform, remains untouched. In this paper, we bridge this gap by introducing an efficient framework, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching). Our approach features a DFS-based warp-centric batch-dynamic subgraph matching algorithm. To ensure load balance in the DFS-based search, we propose warp-level work stealing via shared memory. Additionally, we introduce coalesced search to reduce redundant computations. Comprehensive experiments demonstrate the superior performance of GAMMA. Compared to state-of-the-art algorithms, GAMMA showcases a performance improvement up to hundreds of times.
翻译:子图匹配因其多样化的现实应用而日益受到关注。鉴于现实世界图的动态特性,在不产生过高开销的情况下应对动态场景一直是研究重点。然而,现有动态子图匹配方法通常串行处理,逐一检索每条更新边的增量匹配。这种方案在处理批量数据更新时表现不足,导致系统吞吐量下降。利用GPU的并行处理能力(可同时执行大量核心)在多个领域已被广泛认可为性能加速手段。令人惊讶的是,面向批量动态图的子图匹配在GPU平台上的系统性探索仍属空白。本文通过提出高效框架GAMMA(GPU加速的批量动态子图匹配)填补了这一空白。我们的方法采用基于DFS的线程束中心批量动态子图匹配算法。为确保DFS搜索中的负载均衡,我们提出通过共享内存实现线程束级工作窃取。此外,我们引入合并搜索以减少冗余计算。综合实验证明了GAMMA的卓越性能。与最先进算法相比,GAMMA的性能提升可达数百倍。