Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.
翻译:基于配体结合亲和力的药物分子快速筛选是药物发现流程中的重要环节。图神经网络指纹作为一种高吞吐量、高保真度的分子对接替代方法具有广阔前景。本研究构建了包含约30万个候选药物在23个冠状病毒蛋白靶点上的COVID-19药物对接数据集。基于该数据集,我们训练了用于高通量虚拟COVID-19药物筛选的图神经网络指纹对接模型。图神经网络指纹模型在对接评分预测中展现出高精度,大部分对接靶点的均方误差低于0.21 kcal/mol,较传统圆形指纹方法有显著提升。为使神经指纹具备未知靶点的可迁移性,我们进一步提出了一种基于多靶点训练的可迁移图神经网络指纹方法。该可迁移模型在保持与靶点特异性图神经网络指纹模型相当精度的同时,展现出卓越的训练与数据效率。值得强调的是,本研究的应用范围不局限于COVID-19数据集——我们开发的快速虚拟配体筛选方法可便捷地适配并集成至通用机器学习加速流程中,为应对未来生物威胁提供有力支撑。