Learning representations of molecular structures using deep learning is a fundamental problem in molecular property prediction tasks. Molecules inherently exist in the real world as three-dimensional structures; furthermore, they are not static but in continuous motion in the 3D Euclidean space, forming a potential energy surface. Therefore, it is desirable to generate multiple conformations in advance and extract molecular representations using a 4D-QSAR model that incorporates multiple conformations. However, this approach is impractical for drug and material discovery tasks because of the computational cost of obtaining multiple conformations. To address this issue, we propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations to generate a latent vector universal to multiple conformations from a 2D molecular graph. Our method, called Boltzmann GNN, is formulated by maximizing the conditional marginal likelihood of a conditional generative model for conformations generation. We show that our model has a better prediction performance for molecular properties than existing pre-training methods using molecular graphs and three-dimensional molecular structures.
翻译:利用深度学习学习分子结构的表示是分子性质预测任务中的基本问题。分子在现实中天然以三维结构存在;此外,它们并非静止不动,而是在三维欧氏空间中持续运动,从而形成势能面。因此,理想的做法是预先生成多种构象,并利用整合了多种构象的四维定量构效关系模型提取分子表示。然而,由于获取多种构象的计算成本较高,这种方法在药物与材料发现任务中并不实用。针对这一问题,我们提出了一种分子图神经网络的预训练方法,该方法利用现有分子构象数据集,从二维分子图中生成对多种构象通用的潜向量。我们提出的方法名为玻尔兹曼图神经网络,其原理是通过最大化用于构象生成的条件生成模型的条件边际似然进行推导。实验表明,与现有基于分子图和三维分子结构的预训练方法相比,我们的模型在分子性质预测性能上表现更优。