Link prediction problem has increasingly become prominent in many domains such as social network analyses, bioinformatics experiments, transportation networks, criminal investigations and so forth. A variety of techniques has been developed for link prediction problem, categorized into 1) similarity based approaches which study a set of features to extract similar nodes; 2) learning based approaches which extract patterns from the input data; 3) probabilistic statistical approaches which optimize a set of parameters to establish a model which can best compute formation probability. However, existing literatures lack approaches which utilize strength of each approach by integrating them to achieve a much more productive one. To tackle the link prediction problem, we propose an approach based on the combination of first and second group methods; the existing studied works use just one of these categories. Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes, which enforce the approach more efficiency compared to approaches using mere measures. Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures which differentiate the strength of clusters; basically, the usage of local and global indices and the clustering information plays an imperative role in our link prediction process. Some extensive experiments held on real datasets including Facebook, Brightkite and HepTh indicate good performances of our proposal method. Besides, we have experimentally verified our approach with some previous techniques in the area to prove the supremacy of ours.
翻译:链接预测问题在社交网络分析、生物信息学实验、交通网络、刑事调查等诸多领域日益凸显其重要性。目前针对链接预测问题已发展出多种技术,可分为:1)基于相似度的方法,通过研究一组特征来提取相似节点;2)基于学习的方法,从输入数据中提取模式;3)概率统计方法,通过优化一组参数建立模型,以最佳方式计算形成概率。然而,现有文献缺乏利用每种方法优势并将其整合以形成更高效方法的研究。为解决链接预测问题,我们提出一种结合第一类和第二类方法的技术方案;现有研究仅采用其中一类方法。我们开发的两阶段方法首先确定与节点位置和动态行为相关的新特征,这使得该方法相较于仅使用单一度量指标的方法具有更高效率。随后,应用子空间聚类算法基于计算所得相似度度量对社交对象进行分组,从而区分聚类的强度;本质上,局部与全局指标的使用以及聚类信息在我们的链接预测过程中发挥着关键作用。在Facebook、Brightkite和HepTh等真实数据集上进行的大量实验表明,我们提出的方法表现出良好性能。此外,我们通过与领域内现有技术进行实验对比,验证了本方法的优越性。