Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we present novel sequential and parallel triangle counting algorithms based on identifying horizontal-edges in a breadth-first search (BFS) traversal of the graph. Identifying horizontal-edges allows our algorithm to drastically reduce the number of edges examined for set intersections. Our new approach is a communication-efficient parallel algorithm that asymptotically reduces communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new algorithm reduces the communication by 22.1x on a scale 36 and by 181x on a scale 42. Because communication is known to be the dominant cost of distributed parallel triangle counting, our new parallel algorithm can significantly reduce the practical execution time for counting triangles in large graphs.
翻译:在图分析的实际应用中,三角形计数与发现常被用于表征图的凝聚性并识别社区结构。本文提出基于广度优先搜索(BFS)遍历过程中水平边识别的新型串行与并行三角形计数算法。通过识别水平边,本算法能大幅减少集合交集运算中需要检查的边数。这一新方法是一种通信高效的并行算法,能够在真实社交网络及Graph500基准测试的合成图等大规模图上渐近式地降低通信开销。根据我们对大规模Graph500图的估算,新算法在规模为36时减少通信量22.1倍,规模为42时减少181倍。鉴于通信开销已被确认为分布式并行三角形计数的主要成本,本算法可显著缩短大规模图中三角形计数的实际执行时间。