The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
翻译:新型蛋白质结构的设计仍是生物医学与化学领域蛋白质工程中的一项挑战。在此研究方向中,一种基于三维刚体(称为框架)的扩散模型已成功生成自然界中未曾观察到的新型功能性蛋白质主链。然而,目前尚缺乏一个在SE(3)(即R³空间中保持定向的刚体运动空间)上、对框架进行操作且具有群不变性的原则性方法论框架。我们通过发展多框架上SE(3)不变扩散模型的理论基础,进而提出一个新颖框架——FrameDiff,用于学习多框架上的SE(3)等变得分函数,从而弥补上述不足。我们将FrameDiff应用于单体主链生成,发现其无需依赖此前方法中不可或缺的预训练蛋白质结构预测网络,即可生成长达500个氨基酸的可设计单体。我们生成的样本能够泛化超越任何已知蛋白质结构。