State-of-the-art AI deep potentials provide ab initio-quality results, but at a fraction of the computational cost of first-principles quantum mechanical calculations, such as density functional theory. In this work, we bring AI deep potentials into GROMACS, a production-level Molecular Dynamics (MD) code, by integrating with DeePMD-kit that provides domain-specific deep learning (DL) models of interatomic potential energy and force fields. In particular, we enable AI deep potentials inference across multiple DP model families and DL backends by coupling GROMACS Neural Network Potentials with the C++/CUDA backend in DeePMD-kit. We evaluate two recent large-atom-model architectures, DPA2 that is based on the attention mechanism and DPA3 that is based on GNN, in GROMACS using four ab initio-quality protein-in-water benchmarks (1YRF, 1UBQ, 3LZM, 2PTC) on NVIDIA A100 and GH200 GPUs. Our results show that DPA2 delivers up to 4.23x and 3.18x higher throughput than DPA3 on A100 and GH200 GPUs, respectively. We also provide a characterization study to further contrast DPA2 and DPA3 in throughput, memory usage, and kernel-level execution on GPUs. Our findings identify kernel-launch overhead and domain-decomposed inference as the main optimization priorities for AI deep potentials in production MD simulations.
翻译:最先进的AI深度势能方法能够提供从头算精度的结果,而其计算成本仅为第一性原理量子力学计算(如密度泛函理论)的一小部分。本工作中,我们通过集成DeePMD-kit——一个提供面向领域的原子间势能与力场深度学习模型的工具包,将AI深度势能引入到生产级分子动力学模拟软件GROMACS中。具体而言,我们通过将GROMACS神经网络势能模块与DeePMD-kit的C++/CUDA后端耦合,实现了跨多种DP模型家族和深度学习后端的AI深度势能推理。我们在NVIDIA A100和GH200 GPU上,使用四个具有从头算精度的水溶液蛋白质基准体系(1YRF、1UBQ、3LZM、2PTC),在GROMACS中评估了两种近期提出的大原子模型架构:基于注意力机制的DPA2和基于图神经网络的DPA3。结果表明,在A100和GH200 GPU上,DPA2的吞吐量分别比DPA3最高提升4.23倍和3.18倍。我们还提供了详细的性能表征研究,进一步对比了DPA2与DPA3在吞吐量、内存使用以及GPU内核级执行方面的表现。我们的研究指出,内核启动开销和基于区域分解的推理是生产级分子动力学模拟中AI深度势能优化的主要方向。