Efficiently creating a concise but comprehensive data set for training machine-learned interatomic potentials (MLIPs) is an under-explored problem. Active learning, which uses biased or unbiased molecular dynamics (MD) to generate candidate pools, aims to address this objective. Existing biased and unbiased MD-simulation methods, however, are prone to miss either rare events or extrapolative regions -- areas of the configurational space where unreliable predictions are made. This work demonstrates that MD, when biased by the MLIP's energy uncertainty, simultaneously captures extrapolative regions and rare events, which is crucial for developing uniformly accurate MLIPs. Furthermore, exploiting automatic differentiation, we enhance bias-forces-driven MD with the concept of bias stress. We employ calibrated gradient-based uncertainties to yield MLIPs with similar or, sometimes, better accuracy than ensemble-based methods at a lower computational cost. Finally, we apply uncertainty-biased MD to alanine dipeptide and MIL-53(Al), generating MLIPs that represent both configurational spaces more accurately than models trained with conventional MD.
翻译:高效创建用于训练机器学习原子间势能(MLIPs)的简洁而全面的数据集是一个尚未充分探索的问题。主动学习方法通过使用偏置或非偏置分子动力学(MD)生成候选池,旨在解决这一目标。然而,现有的偏置和非偏置MD模拟方法容易遗漏罕见事件或外推区域——即构型空间中预测不可靠的区域。本研究表明,当MD被MLIP的能量不确定性偏置时,能够同时捕获外推区域和罕见事件,这对于开发均匀精确的MLIPs至关重要。此外,利用自动微分技术,我们通过引入偏置应力的概念增强了偏置力驱动的MD。我们采用经过校准的基于梯度的不确定性,以较低的计算成本获得了与基于集成方法精度相当甚至有时更优的MLIPs。最后,我们将不确定性偏置MD应用于丙氨酸二肽和MIL-53(Al),生成的MLIPs比使用传统MD训练的模型更准确地表征了两种构型空间。