Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization

Hikaru Ibayashi,Taufeq Mohammed Razakh,Liqiu Yang,Thomas Linker,Marco Olguin,Shinnosuke Hattori,Ye Luo,Rajiv K. Kalia,Aiichiro Nakano,Ken-ichi Nomura,Priya Vashishta

from arxiv, This paper is published at International Supercomputing Conference 2023

Neural-network quantum molecular dynamics (NNQMD) simulations based on machine learning are revolutionizing atomistic simulations of materials by providing quantum-mechanical accuracy but orders-of-magnitude faster, illustrated by ACM Gordon Bell prize (2020) and finalist (2021). State-of-the-art (SOTA) NNQMD model founded on group theory featuring rotational equivariance and local descriptors has provided much higher accuracy and speed than those models, thus named Allegro (meaning fast). On massively parallel supercomputers, however, it suffers a fidelity-scaling problem, where growing number of unphysical predictions of interatomic forces prohibits simulations involving larger numbers of atoms for longer times. Here, we solve this problem by combining the Allegro model with sharpness aware minimization (SAM) for enhancing the robustness of model through improved smoothness of the loss landscape. The resulting Allegro-Legato (meaning fast and "smooth") model was shown to elongate the time-to-failure $t_\textrm{failure}$, without sacrificing computational speed or accuracy. Specifically, Allegro-Legato exhibits much weaker dependence of timei-to-failure on the problem size, $t_{\textrm{failure}} \propto N^{-0.14}$ ($N$ is the number of atoms) compared to the SOTA Allegro model $\left(t_{\textrm{failure}} \propto N^{-0.29}\right)$, i.e., systematically delayed time-to-failure, thus allowing much larger and longer NNQMD simulations without failure. The model also exhibits excellent computational scalability and GPU acceleration on the Polaris supercomputer at Argonne Leadership Computing Facility. Such scalable, accurate, fast and robust NNQMD models will likely find broad applications in NNQMD simulations on emerging exaflop/s computers, with a specific example of accounting for nuclear quantum effects in the dynamics of ammonia.

翻译：基于机器学习的神经网络量子分子动力学（NNQMD）模拟，通过提供量子力学精度且快数个数量级的速度，正在革新材料的原子尺度模拟，这已由2020年ACM戈登·贝尔奖及2021年入围者所证实。基于群论、具有旋转等变性与局部描述子的最新（SOTA）NNQMD模型（名为Allegro，意为“快速”）相比其他模型提供了更高的精度与速度。然而，在大规模并行超级计算机上，该模型面临保真度-可扩展性问题：原子间作用力的非物理预测数量不断增长，阻碍了涉及更多原子、更长时间尺度的模拟。本文通过将Allegro模型与锐度感知最小化（SAM）相结合来解决此问题，通过改进损失景观的平滑性来增强模型的鲁棒性。由此产生的Allegro-Legato（意为“快速且平滑”）模型被证明能延长失效时间$t_\textrm{failure}$，且不牺牲计算速度或精度。具体而言，与SOTA Allegro模型（$t_{\textrm{failure}} \propto N^{-0.29}$）相比，Allegro-Legato模型的失效时间对问题规模的依赖性更弱：$t_{\textrm{failure}} \propto N^{-0.14}$（$N$为原子数），即系统性地延迟了失效时间，从而允许更大规模、更长时程的NNQMD模拟而不发生失败。该模型在阿贡领导计算设施（ALCF）的Polaris超级计算机上还展现出卓越的计算可扩展性与GPU加速性。这种可扩展、精确、快速且鲁棒的NNQMD模型将有望在新兴百亿亿次计算机上的NNQMD模拟中获得广泛应用，一个具体实例是描述氨动力学中的原子核量子效应。