The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.
翻译:湍流模拟对计算能力的持续需求使得计算流体力学成为当前及未来百亿亿级系统的首要应用场景。高阶有限元方法(如谱元法)因在多核CPU和现代GPU加速器上均能实现高性能而日益受到关注。本研究评估了利用谱元法进行高保真计算流体力学时,如何通过域分解策略在模块化超级计算架构上实现可扩展计算——将计算域划分为由GPU驱动的加速模块和采用传统CPU节点的集群模块。我们基于模块化超级计算架构研究了多种流动案例与计算机系统。结果表明,在考虑I/O开销的情况下,整合不同计算架构所引发的通信开销与负载均衡问题通常得不偿失;但当仿真所需内存超过GPU全局内存总容量时,借助额外CPU扩展可用内存则能产生显著收益。我们通过建立简单性能模型来验证跨模块运行的可行性判据。随着模块化超级计算架构日益普及及系统利用率提升需求愈发重要,本研究揭示了单体应用在何时以及如何跨越多个模块实现加速求解的规律。