Physical phenomena such as chemical reactions, bond breaking, and phase transition require molecular dynamics (MD) simulation with ab initio accuracy ranging from milliseconds to microseconds. However, previous state-of-the-art neural network based MD packages such as DeePMD-kit can only reach 4.7 nanoseconds per day on the Fugaku supercomputer. In this paper, we present a novel node-based parallelization scheme to reduce communication by 81%, then optimize the computationally intensive kernels with sve-gemm and mixed precision. Finally, we implement intra-node load balance to further improve the scalability. Numerical results on the Fugaku supercomputer show that our work has significantly improved the time-to-solution of the DeePMD-kit by a factor of 31.7x, reaching 149 nanoseconds per day on 12,000 computing nodes. This work has opened the door for millisecond simulation with ab initio accuracy within one week for the first time.
翻译:化学键断裂、化学反应及相变等物理现象的研究需要从毫秒到微秒时间尺度、具备从头算精度的分子动力学模拟。然而,现有基于神经网络的先进分子动力学软件包(如DeePMD-kit)在"富岳"超级计算机上仅能达到每日4.7纳秒的模拟效率。本文提出一种创新的基于节点的并行化方案,将通信开销降低81%,继而采用sve-gemm内核与混合精度优化计算密集型核心算法,最终通过实施节点内负载均衡进一步提升可扩展性。在"富岳"超级计算机上的数值实验表明,我们的工作将DeePMD-kit的求解时间显著缩短了31.7倍,在12,000个计算节点上实现了每日149纳秒的模拟效率。这项研究首次为在一周内完成毫秒级从头算精度模拟开启了新的可能性。