FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

from arxiv, 15 pages, 2 tables, 2 algorithms, 11 figures. Code, data, pre-trained models, and baseline method predictions are available at https://github.com/BioinfoMachineLearning/FlowDock

Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.

翻译：近期已提出多种强大的蛋白质-配体结构生成式AI模型，但鲜有方法能同时支持柔性蛋白质-配体对接与亲和力估计。在具备此双重功能的方法中，尚无模型能直接同时建模多个结合配体，亦未经过药理学相关药物靶标的严格基准测试，这阻碍了它们在药物发现领域的广泛应用。本研究提出FlowDock——首个基于条件流匹配的深度几何生成模型，能够学习将任意数量结合配体的未结合（apo）结构直接映射至其结合（holo）状态。此外，FlowDock为每个生成的蛋白质-配体复合物结构提供预测的结构置信度评分与结合亲和力值，从而实现对新型（多配体）药物靶标的快速虚拟筛选。在著名的PoseBusters基准数据集上，FlowDock以51%的盲对接成功率超越单序列AlphaFold 3（仅使用未结合蛋白质输入结构且未利用多重序列比对信息）；在具有挑战性的新型DockGen-E数据集上，FlowDock在结合口袋泛化能力方面优于单序列AlphaFold 3，并与单序列Chai-1表现相当。此外，在第16届蛋白质结构预测技术关键评估（CASP16）的配体类别中，FlowDock在140个蛋白质-配体复合物的药理学结合亲和力估计中位列前五名，证明了其学习表征在虚拟筛选中的有效性。源代码、数据及预训练模型可通过https://github.com/BioinfoMachineLearning/FlowDock获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日