Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecules and hampers their resilience against data noise. To overcome the limitations, we construct multimodal deep learning models to cover different molecular representations. We convert drug molecules into three molecular representations, SMILES-encoded vectors, ECFP fingerprints, and molecular graphs. To process the modal information, Transformer-Encoder, bi-directional gated recurrent units (BiGRU), and graph convolutional network (GCN) are utilized for feature learning respectively, which can enhance the model capability to acquire complementary and naturally occurring bioinformatics information. We evaluated our triple-modal model on six molecule datasets. Different from bi-modal learning models, we adopt five fusion methods to capture the specific features and leverage the contribution of each modal information better. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise. Moreover, we demonstrate its generalization ability in the prediction of binding constants for protein-ligand complex molecules in the refined set of PDBbind. The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity.

翻译：准确预测分子性质是药物发现中一项具有挑战性但至关重要的任务。近年来，许多单模态深度学习方法已成功应用于分子性质预测。然而，单模态学习的固有局限性源于仅依赖一种分子表征模态，这限制了对药物分子的全面理解，并削弱了其对数据噪声的鲁棒性。为克服这些局限性，我们构建了多模态深度学习模型以覆盖不同的分子表征方式。我们将药物分子转化为三种分子表征：SMILES编码向量、ECFP指纹和分子图。为处理模态信息，分别采用Transformer编码器、双向门控循环单元（BiGRU）和图卷积网络（GCN）进行特征学习，从而增强模型获取互补性及天然存在的生物信息学信息的能力。我们在六个分子数据集上评估了所提出的三模态模型。与双模态学习模型不同，我们采用五种融合方法以捕捉特定特征并更好地利用每种模态信息的贡献。与单模态模型相比，我们的多模态融合深度学习（MMFDL）模型在准确性、可靠性和抗噪声能力方面均优于单一模型。此外，我们在PDBbind精炼集中展示了该模型在蛋白质-配体复合分子结合常数预测中的泛化能力。多模态模型的优势在于能够利用合适的模型和适当的融合方法处理多样化数据源，从而在获取数据多样性的同时增强模型的抗噪声能力。