The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Network-based centrality approaches are commonly used to identify essential proteins from protein-protein interaction networks (PINs). Unfortunately, these approaches are limited by the poor quality of the underlying PIN data. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data. In this paper, we proposed a network refinement method based on module discovery and biological information to obtain a higher quality PIN. First, to extract the maximal connected subgraph in the PIN and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules based on the homology information, subcellular localization information and topology information within each module, and to construct a more refined network (CM-PIN). To evaluate the effectiveness of the proposed method, we used 10 typical network-based centrality methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR) to compare the overall performance of the CM-PIN with those the refined dynamic protein network (RD-PIN). The experimental results showed that the CM-PIN was optimal in terms of precision-recall curve, jackknife curve and other criteria, and can help to identify essential proteins more accurately.
翻译:必需蛋白质的识别有助于理解细胞生存与发育的最低需求。基于网络中心性的方法通常用于从蛋白质相互作用网络中识别必需蛋白质。然而,这些方法受限于原始蛋白质相互作用网络数据的低质量。为解决该问题,研究者将蛋白质相互作用网络与其他生物数据相结合来预测必需蛋白质。本文提出了一种基于模块发现和生物学信息的网络优化方法,以获得更高质量的蛋白质相互作用网络。首先,提取蛋白质相互作用网络中的最大连通子图,并利用Fast-unfolding算法将其划分为不同模块;然后,基于各模块内的同源信息、亚细胞定位信息和拓扑信息检测关键模块,构建更优化的网络(CM-PIN)。为评估所提方法的有效性,我们采用10种经典网络中心性方法(LAC、DC、DMNC、NC、TP、LID、CC、BC、PR、LR)比较CM-PIN与优化动态蛋白质网络(RD-PIN)的整体性能。实验结果表明,CM-PIN在精确率-召回率曲线、刀切法曲线等指标上表现最优,能更准确地识别必需蛋白质。