Datasets used for molecular docking, such as PDBBind, contain technical variability - they are noisy. Although the origins of the noise have been discussed, a comprehensive analysis of the physical, chemical, and bioactivity characteristics of the datasets is still lacking. To address this gap, we introduce the Comprehensive Accurate Assessment (Compass). Compass integrates two key components: PoseCheck, which examines ligand strain energy, protein-ligand steric clashes, and interactions, and AA-Score, a new empirical scoring function for calculating binding affinity energy. Together, these form a unified workflow that assesses both the physical/chemical properties and bioactivity favorability of ligands and protein-ligand interactions. Our analysis of the PDBBind dataset using Compass reveals substantial noise in the ground truth data. Additionally, we propose CompassDock, which incorporates the Compass module with DiffDock, the state-of-the-art deep learning-based molecular docking method, to enable accurate assessment of docked ligands during inference. Finally, we present a new paradigm for enhancing molecular docking model performance by fine-tuning with Compass Scores, which encompass binding affinity energy, strain energy, and the number of steric clashes identified by Compass. Our results show that, while fine-tuning without Compass improves the percentage of docked poses with RMSD < 2{\AA}, it leads to a decrease in physical/chemical and bioactivity favorability. In contrast, fine-tuning with Compass shows a limited improvement in RMSD < 2{\AA} but enhances the physical/chemical and bioactivity favorability of the ligand conformation. The source code is available publicly at https://github.com/BIMSBbioinfo/CompassDock.
翻译:用于分子对接的数据集(如PDBBind)存在技术性变异——即含有噪声。尽管已有研究探讨了噪声的来源,但针对数据集物理、化学及生物活性特征的综合分析仍然缺乏。为填补这一空白,我们提出了综合精准评估框架(Compass)。该框架整合了两个核心组件:PoseCheck(用于检测配体应变能、蛋白质-配体空间冲突及相互作用)以及AA-Score(一种用于计算结合亲和能的新型经验评分函数)。二者共同构成统一工作流,可系统评估配体及蛋白质-配体相互作用的物理/化学特性与生物活性适宜性。通过Compass对PDBBind数据集的分析,揭示了其基准数据中存在显著噪声。进一步,我们提出CompassDock——将Compass模块与当前最先进的基于深度学习的分子对接方法DiffDock相结合,从而在推理过程中实现对对接配体的精准评估。最后,我们提出一种通过Compass评分(包含结合亲和能、应变能及Compass识别的空间冲突数量)进行微调以提升分子对接模型性能的新范式。实验结果表明:未使用Compass的微调虽能提升RMSD < 2Å的对接构象比例,却会导致物理/化学特性与生物活性适宜性下降;而采用Compass进行微调虽对RMSD < 2Å的改善有限,但能显著提升配体构象的物理/化学特性与生物活性适宜性。源代码已公开于https://github.com/BIMSBbioinfo/CompassDock。