Link prediction (LP) is an important problem in network science and machine learning research. The state-of-the-art LP methods are usually evaluated in a uniform setup, ignoring several factors associated with the data and application specific needs. We identify a number of such factors, such as, network-type, problem-type, geodesic distance between the end nodes and its distribution over the classes, nature and applicability of LP methods, class imbalance and its impact on early retrieval, evaluation metric, etc., and present an experimental setup which allows us to evaluate LP methods in a rigorous and controlled manner. We perform extensive experiments with a variety of LP methods over real network datasets in this controlled setup, and gather valuable insights on the interactions of these factors with the performance of LP through an array of carefully designed hypotheses. Following the insights, we provide recommendations to be followed as best practice for evaluating LP methods.
翻译:链接预测(LP)是网络科学与机器学习研究中的重要问题。当前最先进的LP方法通常在统一设置下进行评估,忽略了与数据及应用特定需求相关的若干因素。我们识别了诸多此类因素,例如网络类型、问题类型、端点间测地距离及其在类别上的分布、LP方法的性质与适用性、类别不平衡及其对早期检索的影响、评估指标等,并提出了一种实验设置,使我们能够以严格可控的方式评估LP方法。在此受控设置下,我们基于真实网络数据集对多种LP方法进行了大量实验,并通过一系列精心设计的假设,收集了关于这些因素与LP性能相互作用的宝贵洞见。基于这些洞见,我们提出了评估LP方法时应遵循的最佳实践建议。