Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multigraph neural network (SemiGNN-PPI) that can effectively predict PPIs while being both efficient and generalizable. In SemiGNN-PPI, we not only model the protein correlations but explore the label dependencies by constructing and processing multiple graphs from the perspectives of both features and labels in the graph learning process. We further marry GNN with Mean Teacher to effectively leverage unlabeled graph-structured PPI data for self-ensemble graph learning. We also design multiple graph consistency constraints to align the student and teacher graphs in the feature embedding space, enabling the student model to better learn from the teacher model by incorporating more relationships. Extensive experiments on PPI datasets of different scales with different evaluation settings demonstrate that SemiGNN-PPI outperforms state-of-the-art PPI prediction methods, particularly in challenging scenarios such as training with limited annotations and testing on unseen data.
翻译:蛋白质-蛋白质相互作用(PPI)在多种生物过程中至关重要,对其研究在药物开发和疾病诊断方面具有重要意义。现有深度学习方法在复杂真实场景下受多种因素(如标签稀缺和领域偏移)影响,导致性能显著下降。本文提出一种自集成多图神经网络(SemiGNN-PPI),能够高效且可泛化地预测PPI。在SemiGNN-PPI中,我们不仅对蛋白质相关性进行建模,还通过在图形学习过程中从特征和标签两个角度构建并处理多个图来探索标签依赖性。我们进一步将GNN与Mean Teacher相结合,以有效利用未标注的图结构PPI数据进行自集成图学习。我们还设计了多种图一致性约束,以在特征嵌入空间中对齐学生图和教师图,使学生模型能够通过融入更多关系更好地向教师模型学习。在不同规模PPI数据集及多种评估设置下的广泛实验表明,SemiGNN-PPI优于现有最先进的PPI预测方法,尤其在训练标注有限及对未见数据测试等挑战性场景下表现突出。