Bipartite graphs model relationships between two different sets of entities, like actor-movie, user-item, and author-paper. The butterfly, a 4-vertices 4-edges $2\times 2$ bi-clique, is the simplest cohesive motif in a bipartite graph and is the fundamental component of higher-order substructures. Counting and enumerating the butterflies offer significant benefits across various applications, including fraud detection, graph embedding, and community search. While the corresponding motif, the triangle, in the unipartite graphs has been widely studied in both static and temporal settings, the extension of butterfly to temporal bipartite graphs remains unexplored. In this paper, we investigate the temporal butterfly counting and enumeration problem: count and enumerate the butterflies whose edges establish following a certain order within a given duration. Towards efficient computation, we devise a non-trivial baseline rooted in the state-of-the-art butterfly counting algorithm on static graphs, further, explore the intrinsic property of the temporal butterfly, and develop a new optimization framework with a compact data structure and effective priority strategy. The time complexity is proved to be significantly reduced without compromising on space efficiency. In addition, we generalize our algorithms to practical streaming settings and multi-core computing architectures. Our extensive experiments on 11 large-scale real-world datasets demonstrate the efficiency and scalability of our solutions.
翻译:二分图建模两类实体集合之间的关系,例如演员-电影、用户-物品和作者-论文。蝴蝶结构(butterfly)是一个包含4个顶点和4条边的$2\times 2$双团,是二分图中最简单的内聚基序,也是高阶子结构的核心组成部分。蝴蝶结构的计数与枚举在欺诈检测、图嵌入和社区搜索等应用中具有重要价值。尽管无向图中对应的基序——三角形——已在静态和时序场景下得到广泛研究,但蝴蝶结构在时序二分图中的扩展仍属空白。本文研究时序蝴蝶结构的计数与枚举问题:在给定时间范围内,对满足特定边序的蝴蝶结构进行计数与枚举。为实现高效计算,我们基于静态图最先进的蝴蝶计数算法构建了非平凡基线方法,并进一步挖掘时序蝴蝶结构的内在性质,开发了包含紧凑数据结构和有效优先级策略的新型优化框架。理论分析证明,该方法在不牺牲空间效率的前提下显著降低了时间复杂度。此外,我们将算法推广至实际流式处理场景和多核计算架构。在11个大规模真实数据集上的大量实验验证了所提方案的效率与可扩展性。