Molecule representation learning is crucial for various downstream applications, such as understanding and predicting molecular properties and side effects. In this paper, we propose a novel method called GODE, which takes into account the two-level structure of individual molecules. We recognize that molecules have an intrinsic graph structure as well as being a node in a larger molecule knowledge graph. GODE integrates graph representations of individual molecules with multidomain biochemical data from knowledge graphs. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. This fusion results in a more robust and informative representation, which enhances molecular property prediction by harnessing both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks. Impressively, it surpasses the current leading model in molecule property predictions with average advancements of 2.1% in classification and 6.4% in regression tasks.
翻译:分子表征学习对于理解和预测分子性质及副作用等下游应用至关重要。本文提出一种名为GODE的新方法,该方法考虑了单个分子的双层次结构。我们认识到分子既具有内在的图结构,同时也是更大分子知识图谱中的一个节点。GODE将单个分子的图表示与知识图谱中的多领域生化数据相融合。通过在不同图结构上预训练两个图神经网络(GNN),并结合对比学习,GODE将分子结构与其对应的知识图谱子结构进行融合。这种融合产生了更鲁棒且信息丰富的表征,通过利用化学和生物信息增强了分子性质预测。在11个化学性质任务上进行微调后,我们的模型优于现有基准,分类任务的平均ROC-AUC提升13.8%,回归任务的平均RMSE/MAE改善35.1%。令人瞩目的是,该模型在分子性质预测上超越了当前领先模型,分类任务平均提升2.1%,回归任务平均提升6.4%。