Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/ microsoft/frame-flow.
翻译:蛋白质设计通常始于对目标功能基序的先验知识,而基序支架旨在围绕该基序构建功能蛋白。近年来,生成模型在为多样化的基序设计支架方面取得了突破性成功。然而,生成的支架往往缺乏结构多样性,这可能导致湿实验验证的成功率受限。在本工作中,我们扩展了FrameFlow——一种用于蛋白质骨架生成的SE(3)流匹配模型——通过两种互补方法实现基序支架设计。第一种方法是基序分摊,通过数据增强策略将基序作为输入对FrameFlow进行训练。第二种方法是基序引导,利用FrameFlow的条件分数估计进行支架设计,无需额外训练。两种方法均实现了与先前最先进方法相当或更高的成功率,且支架的结构多样性提升了2.5倍。代码:https://github.com/microsoft/frame-flow。