Link prediction can help rectify inaccuracies in various graph algorithms, stemming from unaccounted-for or overlooked links within networks. However, many existing works use a baseline approach, which incurs unnecessary computational costs due to its high time complexity. Further, many studies focus on smaller graphs, which can lead to misleading conclusions. Here, we study the prediction of links using neighborhood-based similarity measures on large graphs. In particular, we improve upon the baseline approach (IBase), and propose a heuristic approach that additionally disregards large hubs (DLH), based on the idea that high-degree nodes contribute little similarity among their neighbors. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, DLH is on average 1019x faster than IBase, especially on web graphs and social networks, while maintaining similar prediction accuracy. Notably, DLH achieves a link prediction rate of 38.1M edges/s and improves performance by 1.6x for every doubling of threads.
翻译:链接预测有助于纠正各种图算法中因网络中未考虑或遗漏链接而产生的误差。然而,现有研究多采用基线方法,其高时间复杂度会导致不必要的计算开销。此外,许多研究聚焦于较小规模的图,这可能得出具有误导性的结论。本文研究在大型图上使用基于邻域的相似性度量进行链接预测。具体而言,我们在基线方法(IBase)基础上进行改进,提出一种额外忽略大型枢纽节点的启发式方法(DLH),其核心思想是高度数节点在其邻居间贡献的相似度有限。在配备双路16核Intel Xeon Gold 6226R处理器的服务器上,DLH平均比IBase快1019倍(尤其在网络图和社交网络中),同时保持相近的预测精度。值得注意的是,DLH实现了每秒3810万条边的链接预测速率,且线程数每增加一倍,性能提升1.6倍。