xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium

Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at https://github.com/zpengmei/xxMD.

翻译：神经网络力场作为替代模型，在从头算分子动力学中已取代量子化学计算，在计算化学领域日益凸显其重要性。当前神经网络力场的主流基准数据集是MD17及其后续扩展版本。这些数据集主要包含基态电子态势能面平衡区附近的几何构型，采样自直接绝热动力学。然而，许多化学反应涉及显著的分子形变，特别是化学键断裂。我们证明MD17数据集中内坐标与能量的分布存在局限性，揭示其不足以表征发生化学反应的体系。针对这一采样缺陷，我们引入源自非绝热动力学的xxMD（扩展激发态分子动力学）数据集。该数据集包含通过多参考波函数理论和密度泛函理论计算得到的能量与力，其核构型空间真实地描绘了化学反应过程，使xxMD成为更具化学相关性的数据集。我们在xxMD数据集上对等变模型进行重新评估，发现其平均绝对误差显著高于MD17及其变体报告的结果。这一发现凸显了构建具有外推能力的通用化神经网络力场模型所面临的挑战。我们提出的xxMD-CASSCF与xxMD-DFT数据集可通过https://github.com/zpengmei/xxMD获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日