Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.
翻译:近期已提出多种强大的蛋白质-配体结构生成式AI模型,但鲜有方法能同时支持柔性蛋白质-配体对接与亲和力估计。在具备此双重功能的方法中,尚无模型能直接同时建模多个结合配体,亦未经过药理学相关药物靶标的严格基准测试,这阻碍了它们在药物发现领域的广泛应用。本研究提出FlowDock——首个基于条件流匹配的深度几何生成模型,能够学习将任意数量结合配体的未结合(apo)结构直接映射至其结合(holo)状态。此外,FlowDock为每个生成的蛋白质-配体复合物结构提供预测的结构置信度评分与结合亲和力值,从而实现对新型(多配体)药物靶标的快速虚拟筛选。在著名的PoseBusters基准数据集上,FlowDock以51%的盲对接成功率超越单序列AlphaFold 3(仅使用未结合蛋白质输入结构且未利用多重序列比对信息);在具有挑战性的新型DockGen-E数据集上,FlowDock在结合口袋泛化能力方面优于单序列AlphaFold 3,并与单序列Chai-1表现相当。此外,在第16届蛋白质结构预测技术关键评估(CASP16)的配体类别中,FlowDock在140个蛋白质-配体复合物的药理学结合亲和力估计中位列前五名,证明了其学习表征在虚拟筛选中的有效性。源代码、数据及预训练模型可通过https://github.com/BioinfoMachineLearning/FlowDock获取。