Link prediction is a key aspect of graph machine learning, with applications as diverse as disease prediction, social network recommendations, and drug discovery. It involves predicting new links that may form between network nodes. Despite the clear importance of link prediction, existing models have significant shortcomings. Graph Convolutional Networks, for instance, have been proven to be highly efficient for link prediction on a variety of datasets. However, they encounter severe limitations when applied to short-path networks and ego networks, resulting in poor performance. This presents a critical problem space that this work aims to address. In this paper, we present the Node Centrality and Similarity Based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information. The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.
翻译:链路预测是图机器学习的关键环节,广泛应用于疾病预测、社交网络推荐和药物发现等领域,旨在预测网络节点间可能形成的新连接。尽管链路预测具有明确的重要性,现有模型仍存在显著缺陷。例如,图卷积网络虽已被证明在多种数据集上对链路预测具有高效性,但在应用于短路径网络和自网络时遭遇严重限制,导致性能不佳。这正是本文拟解决的关键问题。本文提出节点中心性与相似性基参数化模型(NCSM),一种面向链路预测任务的新型方法。NCSM在定制化图神经网络层中,创新性地将节点中心性和相似性度量整合为边特征,有效利用了大规模网络的拓扑信息。该模型是首个考虑拓扑信息的参数化GNN链路预测模型。我们在五个包含数千节点与边的基准图数据集上对模型进行了评估。实验结果表明,NCSM在多个指标和数据集上均优于现有最先进模型(如图卷积网络和变分图自编码器),其卓越性能归因于对节点中心性、相似性度量的创新整合以及对拓扑信息的高效利用。