Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).
翻译:计算生物化学中的深度学习传统上聚焦于分子图的神经表示;然而,语言模型的最新进展凸显了文本中编码的科学知识之丰富。为桥接这两种模态,我们探究如何将分子属性信息从自然语言迁移至图表示。我们研究了在利用对比学习对齐神经图表示与其特性文本描述表示后,属性预测性能的提升。我们实现了神经相关性评分策略以改进文本检索,提出了一种受有机反应启发的、具有化学有效性的新颖分子图增强策略,并在下游MoleculeNet属性分类任务中展示了性能提升。与仅基于图模态预训练的模型相比,我们获得了+4.26%的AUROC提升;与近期提出的分子图/文本对比训练MoMu模型(Su等人,2022)相比,获得了+1.54%的提升。