Universal Machine Learning Interatomic Potentials (uMLIPs), pre-trained on massively diverse datasets encompassing inorganic materials and organic molecules across the entire periodic table, serve as foundational models for quantum-accurate physical simulations. However, uMLIP training requires second-order derivatives, which lack corresponding parallel training frameworks; moreover, scaling to the billion-parameter regime causes explosive growth in computation and communication overhead, making its training a tremendous challenge. We introduce MatRIS-MoE, a billion-parameter Mixture-of-Experts model built upon invariant architecture, and {Janus}, a pioneering high-dimensional distributed training framework for uMLIPs with hardware-aware optimizations. Deployed across two Exascale supercomputers, our code attains a peak performance of 1.2/1.0 EFLOPS (24\%/{35.5\%} of theoretical peak) in single precision at over 90\% parallel efficiency, compressing the training of billion-parameter uMLIPs from weeks to hours. This work establishes a new high-water mark for AI-for-Science (AI4S) foundation models at Exascale and provides essential infrastructure for rapid scientific discovery.
翻译:通用机器学习原子间势(uMLIPs)基于覆盖整个元素周期表无机材料与有机分子的海量多样化数据集进行预训练,是量子精度物理模拟的基础模型。然而,uMLIPs训练需要二阶导数,却缺乏相应的并行训练框架;此外,当模型参数扩展至十亿级别时,计算与通信开销呈爆炸式增长,使得其训练面临巨大挑战。我们提出MatRIS-MoE——一种基于不变架构的十亿参数混合专家模型,以及Janus——首个面向uMLIPs、具备硬件感知优化的高维分布式训练框架。该代码部署于两台百亿亿次超级计算机,在单精度下达到1.2/1.0 EFLOPS峰值性能(理论峰值的24%/35.5%),并行效率超过90%,将十亿参数uMLIPs的训练时间从数周压缩至数小时。本工作为百亿亿次级别AI4S基础模型树立了新的里程碑,为快速科学发现提供了关键基础设施。