Learning representations of molecular structures using deep learning is a fundamental problem in molecular property prediction tasks. Molecules inherently exist in the real world as three-dimensional structures; furthermore, they are not static but in continuous motion in the 3D Euclidean space, forming a potential energy surface. Therefore, it is desirable to generate multiple conformations in advance and extract molecular representations using a 4D-QSAR model that incorporates multiple conformations. However, this approach is impractical for drug and material discovery tasks because of the computational cost of obtaining multiple conformations. To address this issue, we propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations to generate a latent vector universal to multiple conformations from a 2D molecular graph. Our method, called Boltzmann GNN, is formulated by maximizing the conditional marginal likelihood of a conditional generative model for conformations generation. We show that our model has a better prediction performance for molecular properties than existing pre-training methods using molecular graphs and three-dimensional molecular structures.
翻译:利用深度学习学习分子结构的表示是分子性质预测任务中的基本问题。分子在现实世界中天然以三维结构存在;此外,它们并非静止不动,而是在三维欧氏空间中持续运动,形成势能面。因此,理想的做法是预先生成多种构象,并利用结合多种构象的四维定量构效关系模型提取分子表示。然而,由于获取多种构象的计算成本高昂,这种方法在药物和材料发现任务中不切实际。为解决此问题,我们提出一种利用现有分子构象数据集对分子图神经网络进行预训练的方法,该方法能从二维分子图中生成适用于多种构象的潜向量。我们的方法名为玻尔兹曼图神经网络,通过最大化用于构象生成的条件生成模型的条件边际似然而构建。我们证明,与现有使用分子图和三维分子结构的预训练方法相比,我们的模型在分子性质预测方面具有更优的性能。