Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.
翻译:等变图神经网络(GNNs)的最新进展使得深度学习能够开发出快速替代模型,以替代昂贵的从头算量子力学(QM)方法进行分子势能预测。然而,利用GNN构建准确且可迁移的势能模型仍然具有挑战性,因为数据受限于QM方法高昂的计算成本及其理论水平,尤其是对于大型和复杂的分子系统。本文提出对非平衡分子构象进行去噪预训练,以实现更准确和可迁移的GNN势能预测。具体而言,对采样的非平衡构象的原子坐标施加随机噪声,并预训练GNN以去噪受扰动的分子构象,从而恢复原始坐标。在多个基准上的严格实验表明,预训练显著提高了神经势能的准确性。此外,我们证明所提出的预训练方法是模型无关的,因为它能提升不同不变和等变GNN的性能。值得注意的是,我们基于小分子预训练的模型展现出卓越的可迁移性,在针对不同分子系统(包括不同元素、带电分子、生物分子及更大系统)进行微调时,性能均得到提升。这些结果突显了利用去噪预训练方法为复杂分子系统构建更具泛化能力的神经势能的潜力。