Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.
翻译:构建支持目标基序(赋予蛋白质功能的 scaffold 结构)在疫苗和酶的设计中展现出应用前景。然而,基序支架问题的通用解决方案仍悬而未决。当前用于支架设计的机器学习技术要么局限于不切实际的小型支架(长度不超过20个残基),要么难以生成多样化支架。我们提出通过E(3)-等变图神经网络学习多样化且更长的蛋白质骨架结构的分布。我们开发了SMCDiff算法,能够基于给定基序高效采样符合条件的支架——这是首个在大计算量极限下从扩散模型理论保证条件采样效果的算法。我们通过评估设计骨架与AlphaFold2预测结构的一致性来验证方法有效性。实验表明,本方法能够:(1) 采样长度达80个残基的支架结构;(2) 针对固定基序生成结构多样化的支架。