In this paper, we describe our approach to develop a simulation software application for the fully kinetic Vlasov equation which will be used to explore physics beyond the gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage capabilities due to the high dimensionality of the problem. In addition, the implementation needs to be extensibility regarding the physical model and flexible regarding the hardware for production runs. We start on the algorithmic background to simulate the 6-D Vlasov equation using a semi-Lagrangian algorithm. The performance portable software stack, which enables production runs on pure CPU as well as AMD or Nvidia GPU accelerated nodes, is presented. The extensibility of our implementation is guaranteed through the described software architecture of the main kernel, which achieves a memory bandwidth of almost 500 GB/s on a V100 Nvidia GPU and around 100 GB/s on an Intel Xeon Gold CPU using a single code base. We provide performance data on multiple node level architectures discussing utilized and further available hardware capabilities. Finally, the network communication bottleneck of 6-D grid based algorithms is quantified. A verification of physics beyond gyrokinetic theory for the example of ion Bernstein waves concludes the work.
翻译:本文描述了我们针对全动力学Vlasov方程开发仿真软件应用的方法,该软件将用于探索超越回旋动力学模型的物理现象。由于问题的高维特性,模拟全动力学Vlasov方程需要高效利用计算与存储资源。此外,实现方案需要在物理模型方面具备可扩展性,并在生产运行场景下对硬件保持灵活性。我们从采用半拉格朗日算法模拟六维Vlasov方程的算法基础出发,介绍了支持纯CPU以及AMD或Nvidia GPU加速节点生产运行的性能可移植软件栈。通过描述主核的软件架构,我们确保了实现的可扩展性:该主核在V100 Nvidia GPU上实现近500 GB/s的存储带宽,在Intel Xeon Gold CPU上通过单一代码基实现约100 GB/s的带宽。我们提供了多个节点级架构的性能数据,并探讨了已利用及可用的进一步硬件能力。最后,定量分析了六维网格算法的网络通信瓶颈,并以离子伯恩斯坦波为例完成了超越回旋动力学理论的物理验证。