This work presents GAL{\AE}XI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GAL{\AE}XI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. GAL{\AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GAL{\AE}XI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GAL{\AE}XI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GAL{\AE}XI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.
翻译:本文提出GALÆXI作为一种新型节能流动求解器,用于在非结构网格上利用现代图形处理器(GPU)的并行计算能力模拟可压缩流动。GALÆXI采用高阶间断伽辽金谱元法(DGSEM),并通过有限体积子网格方法实现激波捕捉,以确保高阶格式在激波附近的稳定性。本文详细阐述了整体代码设计、并行化策略及计算核函数的实现方法,重点聚焦于非结构网格引起的体积与表面数据间的单元局部映射。当每个GPU分配至少一百万个自由度时,GALÆXI展现出卓越的强可扩展性,可扩展至1024个GPU。为验证其实现,开展了收敛性研究,证实了所实施数值格式的理论收敛阶。此外,分别采用马赫数0.1和1.25的不可压与可压缩泰勒-格林涡对求解器进行验证。网格收敛性研究表明,结果收敛于高保真参考解,且与原始CPU实现的结果吻合。最后,将GALÆXI应用于NASA Rotor 37线性叶栅的大尺度壁面解析大涡模拟。所实施的激波捕捉方法能够准确稳健地捕获前缘处的超声速区域和激波。研究表明,与参考CPU实现相比,GALÆXI执行该模拟所需的能量不足其一半。这使得GALÆXI成为百亿亿次计算及相关新型HPC架构领域中,可压缩流动精确高效模拟的有力工具。