Deep learning has transformed protein design, enabling accurate structure prediction, sequence optimization, and de novo protein generation. Advances in single-chain protein structure prediction via AlphaFold2, RoseTTAFold, ESMFold, and others have achieved near-experimental accuracy, inspiring successive work extended to biomolecular complexes via AlphaFold Multimer, RoseTTAFold All-Atom, AlphaFold 3, Chai-1, Boltz-1 and others. Generative models such as ProtGPT2, ProteinMPNN, and RFdiffusion have enabled sequence and backbone design beyond natural evolution-based limitations. More recently, joint sequence-structure co-design models, including ESM3, have integrated both modalities into a unified framework, resulting in improved designability. Despite these advances, challenges still exist pertaining to modeling sequence-structure-function relationships and ensuring robust generalization beyond the regions of protein space spanned by the training data. Future advances will likely focus on joint sequence-structure-function co-design frameworks that are able to model the fitness landscape more effectively than models that treat these modalities independently. Current capabilities, coupled with the dizzying rate of progress, suggest that the field will soon enable rapid, rational design of proteins with tailored structures and functions that transcend the limitations imposed by natural evolution. In this review, we discuss the current capabilities of deep learning methods for protein design, focusing on some of the most revolutionary and capable models with respect to their functionality and the applications that they enable, leading up to the current challenges of the field and the optimal path forward.
翻译:深度学习已彻底改变蛋白质设计领域,实现了精确的结构预测、序列优化及从头蛋白质生成。通过AlphaFold2、RoseTTAFold、ESMFold等模型在单链蛋白质结构预测方面的进展已达到接近实验精度的水平,并推动了后续工作向生物分子复合体领域扩展,包括AlphaFold Multimer、RoseTTAFold All-Atom、AlphaFold 3、Chai-1、Boltz-1等模型。ProtGPT2、ProteinMPNN、RFdiffusion等生成模型则实现了超越自然进化限制的序列与骨架设计。近期,以ESM3为代表的联合序列-结构协同设计模型将两种模态整合至统一框架,显著提升了设计的可行性。尽管取得这些进展,该领域仍面临诸多挑战,包括如何建模序列-结构-功能关系,以及确保模型在训练数据覆盖范围之外的蛋白质空间具有鲁棒泛化能力。未来的突破或将聚焦于开发联合序列-结构-功能协同设计框架,此类框架相较于独立处理各模态的模型,能更有效地模拟适应度景观。当前的技术能力与惊人的发展速度表明,该领域将很快实现快速、理性的蛋白质设计,创造出具有定制结构与功能的蛋白质,突破自然进化的限制。本综述将探讨当前深度学习在蛋白质设计中的应用能力,重点分析若干最具革命性且功能强大的模型及其应用场景,进而剖析该领域当前面临的挑战与最优发展路径。