Interatomic potentials learned using machine learning methods have been successfully applied to atomistic simulations. However, deep learning pipelines are notoriously data-hungry, while generating reference calculations is computationally demanding. To overcome this difficulty, we propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) in describing chemical environments, together with kernel mean embeddings. We extract a feature map from GNNs pre-trained on the OC20 dataset and use it to learn the potential energy surface from system-specific datasets of catalytic processes. Our method is further enhanced by a flexible kernel function that incorporates chemical species information, resulting in improved performance and interpretability. We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches. We make the code available to the community at https://github.com/IsakFalk/atomistic_transfer_mekrr.
翻译:利用机器学习方法学习的原子间势能已成功应用于原子模拟。然而,深度学习流水线通常需要大量数据,而生成参考计算在计算上十分昂贵。为克服这一困难,我们提出了一种迁移学习算法,该算法利用图神经网络(GNNs)描述化学环境的能力,并结合核均值嵌入。我们从在OC20数据集上预训练的GNNs中提取特征图,并利用它从催化过程的特定系统数据集中学习势能面。我们的方法通过一个灵活的内核函数进一步增强,该函数整合了化学物种信息,从而提高了性能和可解释性。我们在一系列复杂度递增的真实数据集上测试了该方法,展示了出色的泛化和迁移性能,并且优于单独依赖GNNs或岭回归的方法,以及类似的微调方法。我们将代码公开在https://github.com/IsakFalk/atomistic_transfer_mekrr。