Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative modeling framework for sampling a distribution of structures from a given protein sequence. We define a diffusion process that models the structure as a system of harmonic oscillators and which naturally induces a cascading-resolution generative process along the eigenmodes of the system. On recent CAMEO targets, EigenFold achieves a median TMScore of 0.84, while providing a more comprehensive picture of model uncertainty via the ensemble of sampled structures relative to existing methods. We then assess EigenFold's ability to model and predict conformational heterogeneity for fold-switching proteins and ligand-induced conformational change. Code is available at https://github.com/bjing2016/EigenFold.
翻译:蛋白质结构预测在单一结构上已达到革命性的精度水平,但需要分布建模范式来捕捉构成生物功能基础的构象集合与灵活性。为此,我们开发了EigenFold——一种扩散生成式建模框架,能够从给定蛋白质序列中采样结构分布。我们定义了一个将结构建模为谐振子系统的扩散过程,该过程沿系统的本征模自然诱导出级联分辨率的生成流程。在近期CAMEO靶标上,EigenFold的中位TMScore达0.84,同时通过采样结构集合相比现有方法提供了更全面的模型不确定性图景。随后,我们评估了EigenFold对折叠转换蛋白质和配体诱导构象变化的构象异质性建模与预测能力。代码可访问https://github.com/bjing2016/EigenFold。