High-fidelity simulations of unsteady fluid flow are now possible with advancements in high-performance computing hardware and software frameworks. Since computational fluid dynamics (CFD) computations are dominated by linear algebraic routines, they can be significantly accelerated through massive parallelization on graphics processing units (GPUs). Thus, GPU implementation of high-fidelity CFD solvers is essential in reducing the turnaround time for quicker design space exploration. In the present work, an immersed boundary method (IBM) based in-house flow solver has been ported to the GPU using OpenACC, a compiler directive-based heterogeneous parallel programming framework. Out of various GPU porting pathways available, OpenACC was chosen because of its minimum code intrusion, low development time, and striking similarity with OpenMP, a similar directive-based shared memory programming framework. A detailed validation study and performance analysis of the parallel solver implementations on the CPU and GPU are presented. The GPU implementation shows a speedup up to the order $O(10)$ over the CPU parallel version and up to the order $O(10^2)$ over the serial code. The GPU implementation also scales well with increasing mesh size owing to the efficient utilization of the GPU processor cores.
翻译:随着高性能计算硬件与软件框架的发展,非定常流体高保真数值模拟已成为可能。由于计算流体力学(CFD)计算主要由线性代数运算主导,因此可通过图形处理器(GPU)的大规模并行化显著加速。因此,高保真CFD求解器的GPU实现对于缩短设计空间探索的周转时间至关重要。本研究将基于浸入边界法(IBM)的自研流场求解器移植至GPU,采用基于编译器指令的异构并行编程框架OpenACC实现。在多种GPU移植方案中,选择OpenACC的原因在于其代码侵入性最低、开发周期短,且与同类基于指令的共享内存编程框架OpenMP高度相似。本文对CPU与GPU并行求解器实现进行了详细的验证研究与性能分析。GPU实现相比CPU并行版本加速比可达$O(10)$量级,相比串行代码加速比可达$O(10^2)$量级。由于GPU处理器核心的有效利用,该实现随网格规模增大仍能保持良好可扩展性。