Molecular property prediction with deep learning has gained much attention over the past years. Owing to the scarcity of labeled molecules, there has been growing interest in self-supervised learning methods that learn generalizable molecular representations from unlabeled data. Molecules are typically treated as 2D topological graphs in modeling, but it has been discovered that their 3D geometry is of great importance in determining molecular functionalities. In this paper, we propose the Geometry-aware line graph transformer (Galformer) pre-training, a novel self-supervised learning framework that aims to enhance molecular representation learning with 2D and 3D modalities. Specifically, we first design a dual-modality line graph transformer backbone to encode the topological and geometric information of a molecule. The designed backbone incorporates effective structural encodings to capture graph structures from both modalities. Then we devise two complementary pre-training tasks at the inter and intra-modality levels. These tasks provide properly supervised information and extract discriminative 2D and 3D knowledge from unlabeled molecules. Finally, we evaluate Galformer against six state-of-the-art baselines on twelve property prediction benchmarks via downstream fine-tuning. Experimental results show that Galformer consistently outperforms all baselines on both classification and regression tasks, demonstrating its effectiveness.
翻译:基于深度学习的分子性质预测在过去几年备受关注。由于标记分子稀缺,利用未标记数据学习通用分子表示的自监督学习方法日益受到重视。分子在建模中通常被视为二维拓扑图,但研究表明其三维几何结构对决定分子功能至关重要。本文提出几何感知线图Transformer(Galformer)预训练,这是一种新颖的自监督学习框架,旨在通过二维和三维模态增强分子表示学习。具体而言,我们首先设计了一个双模态线图Transformer主干网络,用于编码分子的拓扑和几何信息。该主干网络结合有效的结构编码,从两种模态中捕获图结构。随后,我们设计了两个互补的模态间和模态内预训练任务。这些任务提供适当的监督信号,从未标记分子中提取具有判别性的二维和三维知识。最后,我们在12个性质预测基准上通过下游微调将Galformer与六种最先进基线进行对比。实验结果表明,Galformer在分类和回归任务上均一致优于所有基线,证明了其有效性。