Molecular property prediction is an important problem in drug discovery and materials science. As geometric structures have been demonstrated necessary for molecular property prediction, 3D information has been combined with various graph learning methods to boost prediction performance. However, obtaining the geometric structure of molecules is not feasible in many real-world applications due to the high computational cost. In this work, we propose a novel 3D pre-training framework (dubbed 3D PGT), which pre-trains a model on 3D molecular graphs, and then fine-tunes it on molecular graphs without 3D structures. Based on fact that bond length, bond angle, and dihedral angle are three basic geometric descriptors corresponding to a complete molecular 3D conformer, we first develop a multi-task generative pre-train framework based on these three attributes. Next, to automatically fuse these three generative tasks, we design a surrogate metric using the \textit{total energy} to search for weight distribution of the three pretext task since total energy corresponding to the quality of 3D conformer.Extensive experiments on 2D molecular graphs are conducted to demonstrate the accuracy, efficiency and generalization ability of the proposed 3D PGT compared to various pre-training baselines.
翻译:分子性质预测是药物发现和材料科学中的一个重要问题。由于几何结构已被证明对分子性质预测是必要的,3D信息已被结合到各种图学习方法中以提高预测性能。然而,在许多实际应用中,由于高昂的计算成本,获取分子的几何结构并不可行。在这项工作中,我们提出了一种新颖的3D预训练框架(称为3D PGT),该框架在3D分子图上预训练模型,然后在不含3D结构的分子图上进行微调。基于键长、键角和二面角是完整分子3D构象对应的三个基本几何描述符这一事实,我们首先开发了一个基于这三个属性的多任务生成式预训练框架。接着,为了自动融合这三个生成任务,我们设计了一个使用\textit{总能量}的替代指标来搜索三个前置任务的权重分布,因为总能量对应于3D构象的质量。通过在2D分子图上进行的大量实验,验证了所提出的3D PGT相较于各种预训练基线在准确性、效率和泛化能力上的优势。