The linear sequence of amino acids determines protein structure and function. Protein design, known as the inverse of protein structure prediction, aims to obtain a novel protein sequence that will fold into the defined structure. Recent works on computational protein design have studied designing sequences for the desired backbone structure with local positional information and achieved competitive performance. However, similar local environments in different backbone structures may result in different amino acids, indicating that protein structure's global context matters. Thus, we propose the Global-Context Aware generative de novo protein design method (GCA), consisting of local and global modules. While local modules focus on relationships between neighbor amino acids, global modules explicitly capture non-local contexts. Experimental results demonstrate that the proposed GCA method outperforms state-of-the-arts on de novo protein design. Our code and pretrained model will be released.
翻译:氨基酸的线性序列决定了蛋白质的结构与功能。蛋白质设计作为蛋白质结构预测的反问题,旨在获得能够折叠成特定结构的新型蛋白质序列。近年来的计算蛋白质设计方法已通过利用局部位置信息为预定骨架结构设计序列,并取得了有竞争力的性能。然而,不同骨架结构中的相似局部环境可能对应不同氨基酸,表明蛋白质结构的全局上下文至关重要。为此,我们提出了全局上下文感知的蛋白质全新从头设计方法(GCA),包含局部模块与全局模块。局部模块聚焦于相邻氨基酸之间的关系,而全局模块则显式捕捉非局部上下文。实验结果表明,所提出的GCA方法在蛋白质全新从头设计任务上优于现有最先进方法。我们将公开发布代码与预训练模型。