The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particularly in the realm of generative models and optimization algorithms, have been propelling the protein design field towards an unprecedented revolution. In this survey, we systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved and present key generative models and optimization algorithms. We then offer in-depth reviews of each design task and discuss the pertinent applications. Finally, we identify the unresolved challenges and highlight research opportunities that merit deeper exploration.
翻译:具有目标功能的新型蛋白质序列设计是蛋白质工程的核心课题,影响着药物发现和酶工程等多个领域。然而,由于时间和资金限制,在这一庞大的组合搜索空间中进行探索仍然是一个严峻的挑战。随着人工智能的变革性进展,特别是在生成模型和优化算法领域,蛋白质设计领域正经历前所未有的变革。本综述系统地回顾了用于可控蛋白质序列设计的生成式人工智能的最新进展。首先,我们从涉及的约束条件出发,概述了蛋白质序列设计中的基础任务,并介绍了关键的生成模型和优化算法。随后,我们对每项设计任务进行了深入评述,并讨论了相关应用。最后,我们指出了尚未解决的挑战,并强调了值得进一步探索的研究机遇。