The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit energy-conserving PIC code ECsim for exascale architectures. To achieve this, we adopted a pragma-based acceleration strategy using OpenACC, which enables high performance while requiring minimal code restructuring. On the pre-exascale Leonardo system, the accelerated code achieves a $5 \times$ speedup and a $3 \times$ reduction in energy consumption compared to the CPU reference code. Performance comparisons across multiple NVIDIA GPU generations show substantial benefits from the GH200 unified memory architecture. Finally, strong and weak scaling tests on Leonardo demonstrate efficiency of $70 \%$ and $78 \%$ up to 64 and 1024 GPUs, respectively.
翻译:粒子网格(Particle-In-Cell,PIC)方法是一种广泛应用于等离子体物理中在动力学层面模拟等离子体的计算技术。本文介绍了我们为将半隐式能量守恒PIC代码ECsim适配于百亿亿次(exascale)计算架构所做的准备工作。为此,我们采用了一种基于编译制导指令(pragma)的加速策略,利用OpenACC实现了高性能,同时仅需对代码进行最小限度的重构。在准百亿亿次计算系统Leonardo上,加速后的代码相较于CPU参考代码实现了$5 \times$的加速比,并降低了$3 \times$的能耗。跨多代NVIDIA GPU的性能对比表明,GH200统一内存架构带来了显著的性能优势。最后,在Leonardo系统上进行的强可扩展性与弱可扩展性测试表明,在分别扩展至64块和1024块GPU时,效率分别达到了$70 \%$和$78 \%$。