Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.
翻译:机器学习原子间势能模型有时无法再现量子势能面的物理平滑性,导致下游模拟中出现错误行为,而标准的能量与力回归评估可能无法发现此类问题。现有的评估方法(如微正则分子动力学模拟)计算成本高昂,且主要探测近平衡态。为改进机器学习原子间势能模型的评估指标,我们提出了键平滑度表征测试。该高效基准测试通过受控键变形探测势能面,并检测非平滑现象(包括不连续性、虚假极小值和伪力),其探测范围涵盖近平衡态与远离平衡态。我们证明BSCT与分子动力学稳定性具有强相关性,而其计算成本仅为分子动力学模拟的极小部分。为展示BSCT如何指导迭代模型设计,我们采用无约束Transformer主干网络作为测试平台,阐释了诸如新型可微分k近邻算法和温度调控注意力机制等改进措施如何降低该指标识别的伪影。通过基于BSCT系统优化模型设计,所得机器学习原子间势能模型同时实现了较低的传统能量/力回归误差、稳定的分子动力学模拟以及稳健的原子性质预测。我们的研究确立了BSCT既可作为验证指标,也可作为"循环内"模型设计代理,能够警示机器学习原子间势能开发者当前基准测试无法有效评估的物理挑战。