The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learning-based methods has shown remarkable potential in enhancing molecular property prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves molecular property prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in root mean square error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. We also discover that enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%, and augmenting 2D graphs with 3D information increases performance for classification tasks by up to 13.2%, with both enhancements measured using ROC-AUC. The two consolidated insights offer crucial guidance for future advancements in drug discovery.
翻译:分子性质的精确预测对于药物研发,特别是虚拟筛选和化合物优化领域的进展至关重要。近期涌现的大量基于深度学习的方法在提升分子性质预测(MPP)方面展现出显著潜力,尤其是在提高准确性及深化对分子结构的理解方面。然而,两个关键问题随之产生:领域知识的整合是否会增强分子性质预测的准确性?采用多模态数据融合是否比单一数据源方法能产生更精确的结果?为探究这些问题,我们基于多种基准对近期深度学习方法进行了全面回顾与定量分析。我们发现,整合分子信息能显著提升回归与分类任务中的分子性质预测(MPP)性能。具体而言,以均方根误差(RMSE)降低衡量的回归任务改进最高达4.0%,而以受试者工作特征曲线下面积(ROC-AUC)衡量的分类任务改进最高达1.7%。我们还发现,在回归任务中,用1D SMILES数据增强2D图可将多模态学习性能提升最高达9.1%(以ROC-AUC衡量);在分类任务中,用3D信息增强2D图可将性能提升最高达13.2%(以ROC-AUC衡量)。这两项整合性见解为未来药物发现领域的进展提供了关键指导。