This work presents GALAEXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALAEXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. GALAEXI exhibits excellent strong scaling properties up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALAEXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALAEXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALAEXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.
翻译:本文提出GALÆXI作为一种新型节能流动求解器,利用现代图形处理器(GPU)的并行计算能力,在非结构网格上模拟可压缩流动。GALÆXI采用高阶间断伽辽金谱元法(DGSEM),并结合有限体积子单元激波捕捉技术,确保高阶格式在激波附近的稳定性。本文详细阐述了通用代码设计、并行化策略以及计算核函数的实现方法,重点讨论了非结构网格导致的体数据与面数据之间的单元局部映射问题。当每个GPU至少分配一百万个自由度时,GALAEXI在多达1024个GPU上展现出优异的强扩展特性。为验证其实现,我们进行了收敛性研究,恢复了所实现数值格式的理论收敛阶。此外,分别使用马赫数0.1和1.25的Taylor-Green涡流的不可压缩与可压缩形式对求解器进行了验证。网格收敛性研究表明,结果收敛于高精度参考解,且与原始CPU实现结果一致。最后,将GALAEXI应用于NASA Rotor 37线性叶栅的大规模壁面解析大涡模拟。该模拟中,所实现的激波捕捉方法准确且鲁棒地捕获了前缘处的超音速区域与激波。研究表明,相较于参考CPU实现,GALAEXI执行该模拟所需能耗降低逾50%。这使GALAEXI成为百亿亿次计算及相关新型高性能计算架构领域中,实现可压缩流动精确高效模拟的强大工具。