Accelerating the Training and Improving the Reliability of Machine-Learned Interatomic Potentials for Strongly Anharmonic Materials through Active Learning

主动学习 · Learning · 估计/估计量 · Performer · CASES ·

2024 年 9 月 18 日

翻译：加速强非简谐材料机器学习原子间势的训练与提升其可靠性的主动学习方法

Kisung Kang,Thomas A. R. Purcell,Christian Carbogno,Matthias Scheffler

from arxiv, 15 pages, 13 figures

Molecular dynamics (MD) employing machine-learned interatomic potentials (MLIPs) serve as an efficient, urgently needed complement to ab initio molecular dynamics (aiMD). By training these potentials on data generated from ab initio methods, their averaged predictions can exhibit comparable performance to ab initio methods at a fraction of the cost. However, insufficient training sets might lead to an improper description of the dynamics in strongly anharmonic materials, because critical effects might be overlooked in relevant cases, or only incorrectly captured, or hallucinated by the MLIP when they are not actually present. In this work, we show that an active learning scheme that combines MD with MLIPs (MLIP-MD) and uncertainty estimates can avoid such problematic predictions. In short, efficient MLIP-MD is used to explore configuration space quickly, whereby an acquisition function based on uncertainty estimates and on energetic viability is employed to maximize the value of the newly generated data and to focus on the most unfamiliar but reasonably accessible regions of phase space. To verify our methodology, we screen over 112 materials and identify 10 examples experiencing the aforementioned problems. Using CuI and AgGaSe$_2$ as archetypes for these problematic materials, we discuss the physical implications for strongly anharmonic effects and demonstrate how the developed active learning scheme can address these issues.

翻译：采用机器学习原子间势的分子动力学作为一种高效且亟需的补充手段，能够有效辅助第一性原理分子动力学。通过在由第一性原理方法生成的数据上训练这些势函数，其平均预测性能可以媲美第一性原理方法，而计算成本仅为后者的极小部分。然而，对于强非简谐材料，不充分的训练集可能导致对动力学的描述失当，因为关键效应可能在相关案例中被忽略、仅被错误捕捉，或在并未实际出现时被机器学习原子间势所虚构。本研究表明，一种结合机器学习原子间势分子动力学与不确定性估计的主动学习策略能够避免此类问题预测。简而言之，高效的机器学习原子间势分子动力学被用于快速探索构型空间，其中基于不确定性估计与能量可行性的采集函数被用来最大化新生成数据的价值，并聚焦于相空间中最陌生但合理可及的区域。为验证本方法，我们筛选了超过112种材料，并识别出10个存在上述问题的实例。以CuI和AgGaSe$_2$作为此类问题材料的原型，我们讨论了强非简谐效应的物理内涵，并展示了所开发的主动学习策略如何解决这些问题。

相关内容

主动学习

关注 243

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日