The identification of essential proteins in protein-protein interaction networks (PINs) can help to discover drug targets and prevent disease. In order to improve the accuracy of the identification of essential proteins, researchers attempted to obtain a refined PIN by combining multiple biological information to filter out some unreliable interactions in the PIN. Unfortunately, such approaches drastically reduce the number of nodes in the PIN after multiple refinements and result in a sparser PIN. It makes a considerable portion of essential proteins unidentifiable. In this paper, we propose a multi-layer refined network (MR-PIN) that addresses this problem. Firstly, four refined networks are constructed by respectively integrating different biological information into the static PIN to form a multi-layer heterogeneous network. Then scores of proteins in each network layer are calculated by the existing node ranking method, and the importance score of a protein in the MR-PIN is evaluated in terms of the geometric mean of its scores in all layers. Finally, all nodes are sorted by their importance scores to determine their essentiality. To evaluate the effectiveness of the multi-layer refined network model, we apply 16 node ranking methods on the MR-PIN, and compare the results with those on the SPIN, DPIN and RDPIN. Then the predictive performances of these ranking methods are validated in terms of the identification number of essential protein at top100 - top600, sensitivity, specificity, positive predictive value, negative predictive value, F-measure, accuracy, Jackknife, ROCAUC and PRAUC. The experimental results show that the MR-PIN is superior to the existing refined PINs in the identification accuracy of essential proteins.
翻译:在蛋白质-蛋白质相互作用网络(PINs)中识别必需蛋白质有助于发现药物靶点并预防疾病。为提高必需蛋白质识别的准确性,研究者尝试通过整合多种生物学信息过滤PIN中的不可靠相互作用,从而构建精化PIN。然而,这类方法在多次精化后会导致PIN节点数量显著减少,形成更稀疏的网络,使得相当一部分必需蛋白质无法被识别。本文提出一种多层精化网络(MR-PIN)来解决该问题。首先,通过将不同生物学信息分别整合到静态PIN中,构建四个精化网络,形成多层异质网络。其次,利用现有节点排序方法计算每个网络层中蛋白质的得分,并通过各层得分的几何平均数评估蛋白质在MR-PIN中的重要性得分。最后,根据重要性得分对所有节点排序以判断其必需性。为评估多层精化网络模型的有效性,我们在MR-PIN上应用16种节点排序方法,并将其结果与SPIN、DPIN和RDPIN上的结果进行比较。随后,通过前100-600个蛋白质的识别数量、敏感性、特异性、阳性预测值、阴性预测值、F-measure、准确率、Jackknife、ROCAUC和PRAUC等指标验证这些排序方法的预测性能。实验结果表明,MR-PIN在必需蛋白质识别准确性方面优于现有精化PIN。