The diffusion learning method, DiffDock, for docking small-molecule ligands into protein binding sites was recently introduced. Results included comparisons to more conventional docking approaches, with DiffDock showing superior performance. Here, we employ a fully automatic workflow using the Surflex-Dock methods to generate a fair baseline for conventional docking approaches. Results were generated for the common and expected situation where a binding site location is known and also for the condition of an unknown binding site. For the known binding site condition, Surflex-Dock success rates at 2.0 Angstroms RMSD far exceeded those for DiffDock (Top-1/Top-5 success rates, respectively, were 68/81% compared with 45/51%). Glide performed with similar success rates (67/73%) to Surflex-Dock for the known binding site condition, and results for AutoDock Vina and Gnina followed this pattern. For the unknown binding site condition, using an automated method to identify multiple binding pockets, Surflex-Dock success rates again exceeded those of DiffDock, but by a somewhat lesser margin. DiffDock made use of roughly 17,000 co-crystal structures for learning (98% of PDBBind version 2020, pre-2019 structures) for a training set in order to predict on 363 test cases (2% of PDBBind 2020) from 2019 forward. DiffDock's performance was inextricably linked with the presence of near-neighbor cases of close to identical protein-ligand complexes in the training set for over half of the test set cases. DiffDock exhibited a 40 percentage point difference on near-neighbor cases (two-thirds of all test cases) compared with cases with no near-neighbor training case. DiffDock has apparently encoded a type of table-lookup during its learning process, rendering meaningful applications beyond its reach. Further, it does not perform even close to competitively with a competently run modern docking workflow.
翻译:最近,用于将小分子配体对接到蛋白质结合位点的扩散学习方法DiffDock被提出。其结果包括与更传统对接方法的比较,DiffDock显示出优越的性能。在此,我们采用一个使用Surflex-Dock方法的全自动工作流程,为传统对接方法生成一个公平的基线。结果针对结合位点位置已知的常见预期情况以及结合位点未知的情况生成。对于已知结合位点的情况,Surflex-Dock在2.0埃RMSD处的成功率远超DiffDock(Top-1/Top-5成功率分别为68/81%,而DiffDock为45/51%)。Glide在已知结合位点情况下的成功率与Surflex-Dock相似(67/73%),AutoDock Vina和Gnina的结果也遵循这一模式。对于未知结合位点的情况,使用自动方法识别多个结合口袋,Surflex-Dock的成功率再次超过DiffDock,但优势幅度稍小。DiffDock使用了大约17,000个共晶结构进行学习(PDBBind 2020版的98%,2019年以前的结构)作为训练集,以预测2019年及以后的363个测试案例(PDBBind 2020的2%)。DiffDock的性能与训练集中存在近乎相同的蛋白质-配体复合物的近邻案例密不可分,这涉及超过一半的测试集案例。与没有近邻训练案例的情况相比,DiffDock在近邻案例(占所有测试案例的三分之二)上表现出40个百分点的差异。DiffDock显然在其学习过程中编码了一种表查找机制,使其无法实现有意义的应用扩展。此外,其性能甚至无法与一个运行良好的现代对接工作流程相提并论。