Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.
翻译:蛋白质-蛋白质相互作用在生化过程中至关重要。准确预测蛋白质-蛋白质相互作用位点(PPIs)可加深我们对生物机制的理解,并对新药设计至关重要。然而,传统的PPIs预测实验方法成本高昂且耗时,因此近期开发了许多计算方法,尤其是基于机器学习的方法。尽管这些方法已取得令人满意的结果,但仍存在两个局限性:(1)大多数模型挖掘了一些有用的输入特征,但未能考虑共进化特征,而后者可为残基间关系提供线索;(2)基于注意力的模型仅对邻近残基分配注意力权重,而非全局分配,忽略了远离目标残基的某些残基也可能产生影响。我们提出了一种共进化增强的全局注意力神经网络——一种基于序列的深度学习模型用于PPIs预测,命名为CoGANPPIS。该模型并行使用三个层进行特征提取:(1)局部级表示聚合层,用于聚合邻近残基的特征;(2)全局级表示学习层,采用新颖的共进化增强全局注意力机制,对同一蛋白质序列上的所有残基分配注意力权重;(3)共进化信息学习层,将CNN与池化应用于共进化信息,以获取共进化轮廓表示。随后,三个输出被拼接并传递至多个全连接层进行最终预测。在两个基准数据集上的应用证明了我们模型的最优性能。源代码已公开于https://github.com/Slam1423/CoGANPPIS_source_code。