Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibrium conformations optimized by electronic structure methods, far different from the sequence-type and graph-type data. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit. Then, the raw conformation is iteratively updated to its target DFT equilibrium conformation using neural networks, and the learned conformation will be used to predict the QC properties. To effectively learn this update process towards the equilibrium conformation, we introduce a two-track Transformer model backbone and train it with the QC property prediction task. We also design a novel approach to guide the model's training process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction in various datasets. We have made the code and model publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.
翻译:摘要:近年来深度学习的发展在加速量子化学性质预测方面取得了显著进展,通过省去密度泛函理论等昂贵的电子结构计算需求。然而,以往基于一维SMILES序列或二维分子图的学习方法未能实现高精度,因为量子化学性质主要依赖于电子结构方法优化的三维平衡构象,这与序列型和图型数据截然不同。本文提出了一种名为Uni-Mol+的新方法应对这一挑战。Uni-Mol+首先利用RDKit等低成本方法生成原始三维分子构象,随后通过神经网络将原始构象迭代更新至目标DFT平衡构象,并利用学习到的构象预测量子化学性质。为有效学习这一趋近平衡构象的更新过程,我们引入了双通道Transformer模型架构,并结合量子化学性质预测任务进行训练。我们还设计了一种新颖的方法指导模型训练过程。广泛的基准测试结果表明,所提出的Uni-Mol+在多个数据集上显著提升了量子化学性质预测的精度。相关代码和模型已开源发布于 \url{https://github.com/dptech-corp/Uni-Mol}。