Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at https://github.com/zpengmei/xxMD.
翻译:神经网络力场作为替代模型,在从头算分子动力学中已取代量子化学计算,在计算化学领域日益凸显其重要性。当前神经网络力场的主流基准数据集是MD17及其后续扩展版本。这些数据集主要包含基态电子态势能面平衡区附近的几何构型,采样自直接绝热动力学。然而,许多化学反应涉及显著的分子形变,特别是化学键断裂。我们证明MD17数据集中内坐标与能量的分布存在局限性,揭示其不足以表征发生化学反应的体系。针对这一采样缺陷,我们引入源自非绝热动力学的xxMD(扩展激发态分子动力学)数据集。该数据集包含通过多参考波函数理论和密度泛函理论计算得到的能量与力,其核构型空间真实地描绘了化学反应过程,使xxMD成为更具化学相关性的数据集。我们在xxMD数据集上对等变模型进行重新评估,发现其平均绝对误差显著高于MD17及其变体报告的结果。这一发现凸显了构建具有外推能力的通用化神经网络力场模型所面临的挑战。我们提出的xxMD-CASSCF与xxMD-DFT数据集可通过https://github.com/zpengmei/xxMD获取。