Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP presents an easy-to-use, flexible, plug-in interface that enables distributed inference of pre-existing MLIPs. We demonstrate DistMLIP on four widely used and state-of-the-art MLIPs: CHGNet, MACE, TensorNet, and eSEN. We show that DistMLIP can simulate atomic systems 3.4x larger and up to 8x faster compared to previous multi-GPU methods. We show that existing foundation potentials can perform near-million-atom calculations at the scale of a few seconds on 8 GPUs with DistMLIP.
翻译:大规模原子模拟对于将计算材料学和化学与现实的材料及药物发现应用相连接至关重要。过去几年中,机器学习原子间势(MLIPs)的快速发展为扩展量子力学计算规模提供了解决方案。将这些原子间势在多个设备间并行化,为进一步将模拟规模扩展至实际应用提供了一种具有挑战性但前景广阔的方法。本文提出DistMLIP,一个基于零冗余、图级并行化的高效MLIP分布式推理平台。与传统空间分区并行化方法不同,DistMLIP通过图分区实现高效的MLIP并行化,支持在多层图神经网络等灵活的MLIP模型架构上进行多设备推理。DistMLIP提供了易用、灵活、即插即用的接口,能够对现有MLIP实现分布式推理。我们在四种广泛使用且最先进的MLIP上验证了DistMLIP的性能:CHGNet、MACE、TensorNet和eSEN。实验表明,与先前的多GPU方法相比,DistMLIP能够模拟的原子体系规模扩大3.4倍,速度提升最高达8倍。我们证明现有基础势能模型借助DistMLIP在8个GPU上可在数秒内完成近百万原子规模的模拟计算。