An efficient multi-GPU implementation for the Discontinuous Galerkin ocean model SLIM

Unstructured-mesh ocean models are increasingly used for coastal applications due to their ability to represent complex geometries and apply local grid refinement where needed. However, their broader use has been hindered by their high computational cost, particularly for models based on the Discontinuous Galerkin finite element (DG-FE) method, which involves significantly more degrees of freedom than traditional finite volume or continuous finite element approaches. The rapid emergence of GPU-based high-performance computing architectures now offers a pathway to address this limitation, as DG-FE formulations are inherently well suited to massively parallel, element-wise computations. Here, we present a full 3D DG-FE ocean model implementation optimized for both single- and multi-GPU systems, with support for both NVIDIA and AMD architectures. We detail the computational strategies employed to achieve high performance, including memory layout optimization, kernel-level parallelization, and matrix-free solvers for key vertical processes. Benchmark results demonstrate that a single HPC-grade GPU (e.g. NVIDIA A100) delivers performance equivalent to approximately 1500 CPU cores, while replacing a 128-core CPU node with a 4xA100 GPU node yields a speedup of around 50x. Weak-scaling efficiency is maintained up to 1024 GPUs. We further demonstrate the model's capabilities on a real-world application in the Great Barrier Reef, achieving a spatial resolution five times finer than the most accurate existing model while maintaining a physical-to-numerical time ratio of 100. These results highlight how GPU-accelerated DG-FE methods can dramatically advance the capabilities of unstructured-mesh ocean modeling, enabling ultra-high-resolution coastal simulations that were previously infeasible.

翻译：非结构化网格海洋模型因其能够刻画复杂几何区域并在需要处进行局部网格加密，在海岸带应用中得到日益广泛的应用。然而，这类模型，尤其是基于不连续伽辽金有限元（DG-FE）方法的模型，其计算成本远高于传统有限体积或连续有限元方法，因此其广泛使用受到限制。随着基于GPU的高性能计算架构迅速发展，为突破这一瓶颈提供了可能路径，因为DG-FE公式天然适用于大规模并行的单元级计算。本文提出了一种完整的三维DG-FE海洋模型实现，针对单GPU和多GPU系统进行了优化，并支持NVIDIA和AMD两种架构。我们详细阐述了实现高性能所采用的计算策略，包括内存布局优化、内核级并行化以及针对关键垂向过程的无矩阵求解器。基准测试结果表明，单块高性能GPU（如NVIDIA A100）可达到约1500个CPU核心的等效性能；用4块A100 GPU节点替换128核CPU节点可实现约50倍的加速比。弱扩展效率可维持至1024个GPU。我们进一步在大堡礁实际应用中验证了模型能力，在保持物理-数值时间比为100的同时，实现了比现有最精确模型精细五倍的空间分辨率。这些结果突显了GPU加速的DG-FE方法能够显著提升非结构化网格海洋建模的能力，使此前不可行的极高分辨率海岸带模拟成为可能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2023】SimMMDG：一个简单而有效的多模态领域泛化框架

专知会员服务

47+阅读 · 2023年10月31日

DNN中的凸优化如何理解？斯坦福博士论文《神经网络凸优化》，265页pdf全面阐述

专知会员服务

66+阅读 · 2023年5月29日

面向海洋的多模态智能计算：挑战、进展和展望

专知会员服务

51+阅读 · 2022年7月27日

【香港中文大学&华为等】双曲图神经网络:方法与应用综述，Hyperbolic Graph Neural Networks: A Review of Methods and Applications

专知会员服务

21+阅读 · 2022年3月2日