Large-scale plasma simulations are critical for designing and developing next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work, we characterize single node, multiple nodes, and I/O performances of the BIT1 code in two realistic cases by using several HPC profilers, such as perf, IPM, Extrae/Paraver, and Darshan tools. We find that the BIT1 sorting function on-node performance is the main performance bottleneck. Strong scaling tests show a parallel performance of 77% and 96% on 2,560 MPI ranks for the two test cases. We demonstrate that communication, load imbalance and self-synchronization are important factors impacting the performance of the BIT1 on large-scale runs.
翻译:大规模等离子体模拟对于设计开发下一代聚变能源装置以及工业等离子体建模至关重要。BIT1是一种专为研究聚变装置中等离子体-材料相互作用而设计的大规模并行粒子网格代码,其最显著特征在于包含了针对不同等离子体组分的碰撞蒙特卡罗模型。本研究通过使用perf、IPM、Extrae/Paraver及Darshan等多种HPC性能剖析工具,在两个实际案例中深入分析了BIT1代码的单节点、多节点及I/O性能表现。研究发现,BIT1代码的节点内排序函数是主要性能瓶颈。强扩展性测试显示,在2560个MPI进程上,两个测试案例的并行效率分别达到77%和96%。我们证实通信开销、负载不均衡及自同步机制是影响BIT1在大规模运行中性能的关键因素。