Deep learning has made significant progress in protein structure prediction, advancing the development of computational biology. However, despite the high accuracy achieved in predicting single-chain structures, a significant number of large homo-oligomeric assemblies exhibit internal symmetry, posing a major challenge in structure determination. The performances of existing deep learning methods are limited since the symmetrical protein assembly usually has a long sequence, making structural computation infeasible. In addition, multiple identical subunits in symmetrical protein complex cause the issue of supervision ambiguity in label assignment, requiring a consistent structure modeling for the training. To tackle these problems, we propose a protein folding framework called SGNet to model protein-protein interactions in symmetrical assemblies. SGNet conducts feature extraction on a single subunit and generates the whole assembly using our proposed symmetry module, which largely mitigates computational problems caused by sequence length. Thanks to the elaborate design of modeling symmetry consistently, we can model all global symmetry types in quaternary protein structure prediction. Extensive experimental results on a benchmark of symmetrical protein complexes further demonstrate the effectiveness of our method.
翻译:深度学习在蛋白质结构预测中取得了显著进展,推动了计算生物学的发展。然而,尽管在单链结构预测中实现了高精度,大量大型同源寡聚体组装体仍表现出内部对称性,这给结构测定带来了重大挑战。现有深度学习方法的表现受限,因为对称蛋白质组装体通常具有长序列,导致结构计算不可行。此外,对称蛋白质复合体中的多个相同亚基会引起标签分配的监督模糊性问题,需要在训练中进行一致的结构建模。为解决这些问题,我们提出了一种名为SGNet的蛋白质折叠框架,用于建模对称组装体中的蛋白质-蛋白质相互作用。SGNet对单个亚基进行特征提取,并通过我们提出的对称模块生成整个组装体,这极大地缓解了序列长度带来的计算问题。得益于对称性一致建模的精心设计,我们能够在四级蛋白质结构预测中对所有全局对称类型进行建模。在对称蛋白质复合体基准上的大量实验结果进一步证明了我们方法的有效性。