Rank-and-Reason: Multi-Agent Collaboration Accelerates Zero-Shot Protein Mutation Prediction

Zero-shot mutation prediction is vital for low-resource protein engineering, yet existing protein language models (PLMs) often yield statistically confident results that ignore fundamental biophysical constraints. Currently, selecting candidates for wet-lab validation relies on manual expert auditing of PLM outputs, a process that is inefficient, subjective, and highly dependent on domain expertise. To address this, we propose Rank-and-Reason (VenusRAR), a two-stage agentic framework to automate this workflow and maximize expected wet-lab fitness. In the Rank-Stage, a Computational Expert and Virtual Biologist aggregate a context-aware multi-modal ensemble, establishing a new Spearman correlation record of 0.551 (vs. 0.518) on ProteinGym. In the Reason-Stage, an agentic Expert Panel employs chain-of-thought reasoning to audit candidates against geometric and structural constraints, improving the Top-5 Hit Rate by up to 367% on ProteinGym-DMS99. The wet-lab validation on Cas12i3 nuclease further confirms the framework's efficacy, achieving a 46.7% positive rate and identifying two novel mutants with 4.23-fold and 5.05-fold activity improvements. Code and datasets are released on GitHub (https://github.com/ai4protein/VenusRAR/).

翻译：零样本突变预测对于低资源蛋白质工程至关重要，然而现有的蛋白质语言模型（PLMs）常产生统计上可信但忽略基本生物物理约束的结果。目前，选择用于湿实验验证的候选突变依赖于专家对PLM输出的人工审核，这一过程效率低下、主观性强且高度依赖领域专业知识。为解决此问题，我们提出了排序与推理（VenusRAR），一个两阶段的智能体框架，旨在自动化此工作流程并最大化预期的湿实验适应度。在排序阶段，计算专家与虚拟生物学家通过上下文感知的多模态集成方法，在ProteinGym上建立了0.551（对比0.518）的斯皮尔曼相关性新记录。在推理阶段，一个专家小组智能体运用思维链推理，依据几何与结构约束审核候选突变，在ProteinGym-DMS99上将Top-5命中率提升了高达367%。对Cas12i3核酸酶的湿实验验证进一步证实了该框架的有效性，实现了46.7%的阳性率，并鉴定出两个活性分别提升4.23倍和5.05倍的新型突变体。代码与数据集已在GitHub（https://github.com/ai4protein/VenusRAR/）上发布。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【纽约大学博士论文】蛋白质序列和结构的预测性与生成性模型，206页pdf

专知会员服务

20+阅读 · 2024年4月27日

CancerGPT利用大型语言模型进行少样本药物组合协同作用预测

专知会员服务

21+阅读 · 2023年5月13日

《将具有模型可解释性的机器学习应用于基于序列的蛋白质溶解度预测》2022最新24页报告，美国海军研究实验室生物/分子科学与工程中心

专知会员服务

12+阅读 · 2022年10月21日

Nat. Biotechnol. | 使用语言模型和深度学习的单序列蛋白质结构预测

专知会员服务

11+阅读 · 2022年10月17日