The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
翻译:新型蛋白质结构的设计仍是蛋白质工程中面向生物医学和化学应用的一大挑战。在此研究方向上,针对三维刚体(称为框架)的扩散模型已在生成自然界中未观察到的新型功能性蛋白质骨架方面取得成效。然而,目前尚无基于原则的方法学框架,能对作用于框架的SE(3)(即三维空间中保定向刚体运动的空间)进行扩散并赋予群不变性。我们通过为多框架上的SE(3)不变扩散模型奠定理论基础,进而提出名为FrameDiff的新框架(用于学习多框架上的SE(3)等变得分),从而解决了这些不足。我们将FrameDiff应用于单体骨架生成,发现其无需依赖先前方法中不可或缺的预训练蛋白质结构预测网络,即可生成长度达500个氨基酸的可设计单体。我们发现,我们的样本能够泛化到任何已知蛋白质结构之外。