Molecular dynamics (MD) employing machine-learned interatomic potentials (MLIPs) serve as an efficient, urgently needed complement to ab initio molecular dynamics (aiMD). By training these potentials on data generated from ab initio methods, their averaged predictions can exhibit comparable performance to ab initio methods at a fraction of the cost. However, insufficient training sets might lead to an improper description of the dynamics in strongly anharmonic materials, because critical effects might be overlooked in relevant cases, or only incorrectly captured, or hallucinated by the MLIP when they are not actually present. In this work, we show that an active learning scheme that combines MD with MLIPs (MLIP-MD) and uncertainty estimates can avoid such problematic predictions. In short, efficient MLIP-MD is used to explore configuration space quickly, whereby an acquisition function based on uncertainty estimates and on energetic viability is employed to maximize the value of the newly generated data and to focus on the most unfamiliar but reasonably accessible regions of phase space. To verify our methodology, we screen over 112 materials and identify 10 examples experiencing the aforementioned problems. Using CuI and AgGaSe$_2$ as archetypes for these problematic materials, we discuss the physical implications for strongly anharmonic effects and demonstrate how the developed active learning scheme can address these issues.
翻译:采用机器学习原子间势的分子动力学作为一种高效且亟需的补充手段,能够有效辅助第一性原理分子动力学。通过在由第一性原理方法生成的数据上训练这些势函数,其平均预测性能可以媲美第一性原理方法,而计算成本仅为后者的极小部分。然而,对于强非简谐材料,不充分的训练集可能导致对动力学的描述失当,因为关键效应可能在相关案例中被忽略、仅被错误捕捉,或在并未实际出现时被机器学习原子间势所虚构。本研究表明,一种结合机器学习原子间势分子动力学与不确定性估计的主动学习策略能够避免此类问题预测。简而言之,高效的机器学习原子间势分子动力学被用于快速探索构型空间,其中基于不确定性估计与能量可行性的采集函数被用来最大化新生成数据的价值,并聚焦于相空间中最陌生但合理可及的区域。为验证本方法,我们筛选了超过112种材料,并识别出10个存在上述问题的实例。以CuI和AgGaSe$_2$作为此类问题材料的原型,我们讨论了强非简谐效应的物理内涵,并展示了所开发的主动学习策略如何解决这些问题。