Link prediction can help rectify inaccuracies in community detection stemming from unaccounted-for or overlooked links within networks. Many existing works use a baseline approach, which incurs unnecessary computational costs due to its high time complexity. Further, many studies focus on smaller graphs, which can lead to misleading conclusions. The report introduces two parallel approaches, called IHub and LHub, which predict links using neighborhood-based similarity measures on large graphs. LHub is a heuristic approach, which additionally disregards large hubs - based on the idea that low-degree nodes contribute significant similarity among neighbors. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, LHub is on average 563x faster than IHub, especially on web graphs and social networks, while having similar prediction accuracy. Notably, LHub achieves a link prediction rate of 38.1M edges/s and improves performance at a rate of 1.6x for every doubling of threads.
翻译:链路预测有助于纠正因网络中未考虑或遗漏的链接而导致的社区检测不准确问题。现有许多工作采用基准方法,但由于其时间复杂性高,会带来不必要的计算成本。此外,多数研究聚焦于较小的图,这可能导致误导性结论。本文介绍了两种并行方法,即IHub和LHub,它们通过基于邻域的相似性度量在大型图上预测链接。LHub是一种启发式方法,其额外忽略大型枢纽节点——基于低度节点在邻域间贡献显著相似性的思想。在一台配备双路16核Intel Xeon Gold 6226R处理器的服务器上,LHub平均比IHub快563倍,尤其在网络图和社会网络上的表现突出,同时预测精度相似。值得注意的是,LHub实现了38.1M条边/秒的链路预测速率,且每增加一倍线程数,性能提升1.6倍。