Machine learning (ML) techniques and atomistic modeling have rapidly transformed materials design and discovery. Specifically, generative models can swiftly propose promising materials for targeted applications. However, the predicted properties of materials through the generative models often do not match with calculated properties through ab initio calculations. This discrepancy can arise because the generated coordinates are not fully relaxed, whereas the many properties are derived from relaxed structures. Neural network-based potentials (NNPs) can expedite the process by providing relaxed structures from the initially generated ones. Nevertheless, acquiring data to train NNPs for this purpose can be extremely challenging as it needs to encompass previously unknown structures. This study utilized extended ensemble molecular dynamics (MD) to secure a broad range of liquid- and solid-phase configurations in one of the metallic systems, nickel. Then, we could significantly reduce them through active learning without losing much accuracy. We found that the NNP trained from the distilled data could predict different energy-minimized closed-pack crystal structures even though those structures were not explicitly part of the initial data. Furthermore, the data can be translated to other metallic systems (aluminum and niobium), without repeating the sampling and distillation processes. Our approach to data acquisition and distillation has demonstrated the potential to expedite NNP development and enhance materials design and discovery by integrating generative models.
翻译:机器学习技术与原子尺度建模已迅速改变了材料设计与发现领域。具体而言,生成模型能够快速为特定应用提出有前景的材料候选。然而,通过生成模型预测的材料性质往往与从头计算得到的性质存在差异。这种偏差可能源于生成坐标未完全弛豫,而许多性质却是基于弛豫结构推导得出的。基于神经网络的势能(NNP)可通过将初始生成结构弛豫至稳定构型来加速这一过程。然而,获取用于训练此类NNP的数据极具挑战性,因为必须涵盖先前未知的结构。本研究利用扩展系综分子动力学(MD)在镍金属体系中获取了广泛的液相与固相构型。随后,通过主动学习显著缩减了数据规模而几乎不损失精度。研究发现,经蒸馏数据训练的NNP能够预测不同能量极小化的密堆积晶体结构,即使这些结构并未明确包含在初始数据中。此外,该数据可迁移至其他金属体系(如铝和铌),无需重复采样与蒸馏流程。我们提出的数据获取与蒸馏方法展现了通过集成生成模型加速NNP开发、增强材料设计与发现的潜力。