Link prediction can help rectify inaccuracies in various graph algorithms, stemming from unaccounted-for or overlooked links within networks. However, many existing works use a baseline approach, which incurs unnecessary computational costs due to its high time complexity. Further, many studies focus on smaller graphs, which can lead to misleading conclusions. This technical report introduces two parallel approaches, called IHub and LHub, which predict links using neighborhood-based similarity measures on large graphs. LHub is a heuristic approach that additionally disregards large hubs, based on the idea that high-degree nodes contribute little similarity among their neighbors. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, LHub is on average 1019x faster than IHub, especially on web graphs and social networks, while maintaining similar prediction accuracy. Notably, LHub achieves a link prediction rate of 38.1M edges/s and improves performance at a rate of 1.6x for every doubling of threads.
翻译:链接预测有助于纠正因未考虑或忽略网络中的链路而导致的各种图算法中的不准确性。然而,许多现有工作采用基线方法,由于其高时间复杂度而带来不必要的计算成本。此外,许多研究侧重于较小的图,这可能导致误导性结论。本技术报告介绍了两种并行方法,称为IHub和LHub,它们使用基于邻域的相似性度量在大图上预测链接。LHub是一种启发式方法,其额外忽略大枢纽,基于高度节点对其邻居贡献相似性较少的想法。在配备双16核Intel Xeon Gold 6226R处理器的服务器上,LHub平均比IHub快1019倍,尤其在网络图和社会网络上,同时保持相似的预测精度。值得注意的是,LHub实现了3810万条边/秒的链接预测速率,并且每线程加倍一次,性能提升1.6倍。