Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.
翻译:基于配体结合亲和度的药物分子快速筛选是药物发现流程中的关键环节。图神经网络指纹是一种具有高通量和高保真度的分子对接替代方法。本研究构建了包含约30万种候选药物在23个冠状病毒蛋白靶点上的COVID-19药物对接数据集。基于该数据集,我们训练了用于高通量虚拟COVID-19药物筛选的图神经网络指纹对接模型。该模型在对接评分预测中展现了高精度,多数对接靶点的均方误差低于0.21 kcal/mol,相较传统循环指纹方法有显著提升。为使神经指纹可迁移至未知靶点,我们还提出了一种基于多靶点训练的可迁移图神经网络指纹方法。该方法在保持与靶点特异性模型相当精度的同时,展现出卓越的训练效率和数据利用率。需强调的是,本研究的影响不仅限于COVID-19数据集——我们用于快速虚拟配体筛选的方法可便捷地适配并整合至通用机器学习加速流程,以应对未来生物威胁。