Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a feature can be leveraged to infer missing links. Here, we aim to unveil the capability of a topological feature in link prediction by identifying its prediction performance upper bound. We introduce a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance. The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. Because a family of indexes based on the same feature shares the same upper bound, the potential of all others can be estimated from one single index. Furthermore, a feature's capability is lifted in the supervised prediction, which can be mathematically quantified, allowing us to estimate the benefit of applying machine learning algorithms. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.
翻译:网络通过表示成对相互作用的基础集合,为建模复杂系统提供了一种强大方法。链接预测是一项预测网络中不可直接观测链接的任务,在生物、社会及其他复杂系统中具有深远应用。尽管拓扑特征在该任务中被广泛使用,但尚不清楚一个特征能在多大程度上用于推断缺失链接。本文旨在通过识别拓扑特征在链接预测中的性能上限,揭示其能力。我们引入了一个理论框架,该框架兼容用于衡量特征的不同指标、利用特征的不同预测方法以及量化预测性能的不同度量标准。拓扑特征的最大能力遵循一个简洁且经过理论验证的表达式,该表达式仅取决于缺失链接与不存链接中该特征的持有程度。由于基于同一特征的一组指标共享相同的上限,因此可以从单个指标估算所有其他指标的潜力。此外,在监督预测中,特征的能力会提升,且该提升可被数学量化,从而允许我们估计应用机器学习算法的收益。该发现模式的普适性已通过550个结构多样的网络得到实证验证。研究成果在特征与方法选择中具有应用价值,并揭示了使拓扑特征在链接预测中有效的网络特性。