The maximum capability of a topological feature in link prediction

Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a feature can be leveraged to infer missing links. Here, we aim to unveil the capability of a topological feature in link prediction by identifying its prediction performance upper bound. We introduce a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance. The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. Because a family of indexes based on the same feature shares the same upper bound, the potential of all others can be estimated from one single index. Furthermore, a feature's capability is lifted in the supervised prediction, which can be mathematically quantified, allowing us to estimate the benefit of applying machine learning algorithms. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.

翻译：网络通过表示成对相互作用的基础集合，为建模复杂系统提供了一种强大方法。链接预测是一项预测网络中不可直接观测链接的任务，在生物、社会及其他复杂系统中具有深远应用。尽管拓扑特征在该任务中被广泛使用，但尚不清楚一个特征能在多大程度上用于推断缺失链接。本文旨在通过识别拓扑特征在链接预测中的性能上限，揭示其能力。我们引入了一个理论框架，该框架兼容用于衡量特征的不同指标、利用特征的不同预测方法以及量化预测性能的不同度量标准。拓扑特征的最大能力遵循一个简洁且经过理论验证的表达式，该表达式仅取决于缺失链接与不存链接中该特征的持有程度。由于基于同一特征的一组指标共享相同的上限，因此可以从单个指标估算所有其他指标的潜力。此外，在监督预测中，特征的能力会提升，且该提升可被数学量化，从而允许我们估计应用机器学习算法的收益。该发现模式的普适性已通过550个结构多样的网络得到实证验证。研究成果在特征与方法选择中具有应用价值，并揭示了使拓扑特征在链接预测中有效的网络特性。

相关内容

链路预测

关注 14

网络中的链路预测(Link Prediction)是指如何通过已知的网络节点以及网络结构等信息预测网络中尚未产生连边的两个节点之间产生链接的可能性。这种预测既包含了对未知链接（exist yet unknown links）的预测也包含了对未来链接（future links）的预测。该问题的研究在理论和应用两个方面都具有重要的意义和价值。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日