Protein evolution through amino acid mutations is a cornerstone of life sciences. Recent advances in protein language models have shown rich evolutionary patterns, offering unprecedented potential for in-silicon directed evolution. However, existing directed evolution methods largely rely on heuristic evolution strategies and have yet to efficiently integrate the transformative protein language models with advanced optimization techniques, such as reinforcement learning, to adaptively learn superior evolution policies. To bridge this gap, we propose AlphaDE, a novel framework that evolves protein sequences by harnessing the innovative paradigms of large language models, such as fine-tuning and test-time inference. First, AlphaDE fine-tunes pretrained protein language models using masked language modeling on homologous protein sequences to activate the evolutionary plausibility of the interested protein family. Second, AlphaDE introduces test-time inference based on Monte Carlo tree search, which effectively evolves proteins with evolutionary guidance from the fine-tuned protein language model. Extensive benchmark experiments show that AlphaDE remarkably outperforms previous state-of-the-art methods even with few-shot fine-tuning. A case study further demonstrates that AlphaDE supports condensing the protein sequence space of avGFP through computational evolution.
翻译:通过氨基酸突变实现蛋白质进化是生命科学的基石。近期蛋白质语言模型的进展揭示了丰富的进化模式,为硅基定向进化提供了前所未有的潜力。然而,现有定向进化方法主要依赖启发式进化策略,尚未能有效整合变革性的蛋白质语言模型与强化学习等先进优化技术,以自适应地学习更优的进化策略。为弥补这一空白,我们提出了AlphaDE——一个通过利用大语言模型的创新范式(如微调与测试时推理)来进化蛋白质序列的新型框架。首先,AlphaDE通过在同源蛋白质序列上进行掩码语言建模,对预训练的蛋白质语言模型进行微调,以激活目标蛋白质家族的进化合理性。其次,AlphaDE引入了基于蒙特卡洛树搜索的测试时推理机制,在微调后的蛋白质语言模型的进化指导下高效进化蛋白质。大量基准实验表明,即使仅进行少样本微调,AlphaDE的性能也显著优于先前的最先进方法。一项案例研究进一步证明,AlphaDE能够通过计算进化压缩avGFP的蛋白质序列空间。