Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular behavior and development of protein-based therapeutics and materials. Structure-based protein design aims to find structures that are designable (can be realized by a protein sequence), novel (have dissimilar geometry from natural proteins), and diverse (span a wide range of geometries). While advances in protein structure prediction have made it possible to predict structures of novel protein sequences, the combinatorially large space of sequences and structures limits the practicality of search-based methods. Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data distributions. Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space. Through in silico evaluations, we demonstrate that Genie generates protein backbones that are more designable, novel, and diverse than existing models. This indicates that Genie is capturing key aspects of the distribution of protein structure space and facilitates protein design with high success rates. Code for generating new proteins and training new versions of Genie is available at https://github.com/aqlaboratory/genie.
翻译:蛋白质驱动活细胞中广泛的功能过程。因此,能够创造具有特定结构和功能的新蛋白质,将使细胞行为的工程化以及基于蛋白质的治疗和材料的开发成为可能。基于结构的蛋白质设计旨在寻找具有可设计性(可由蛋白质序列实现)、新颖性(几何形状与天然蛋白质不同)和多样性(涵盖广泛的几何形状)的结构。尽管蛋白质结构预测的进展使得预测新蛋白质序列的结构成为可能,但序列和结构的组合空间巨大,限制了基于搜索方法的实用性。生成模型通过隐式学习复杂数据分布的低维结构,提供了一种引人注目的替代方案。在此,我们利用去噪扩散概率模型和等变神经网络的最新进展,开发了Genie——一种蛋白质结构生成模型,该模型通过3D空间中的定向参考帧云执行离散时间扩散。通过计算机模拟评估,我们证明Genie生成的蛋白质主链比现有模型更具可设计性、新颖性和多样性。这表明Genie捕捉了蛋白质结构空间分布的关键特征,并以高成功率促进蛋白质设计。用于生成新蛋白质和训练新版本Genie的代码可在https://github.com/aqlaboratory/genie获取。