Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution in localized areas across expansive domains. Today's supercomputers' extreme heterogeneity presents a significant challenge for dynamically adaptive codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical simulations, particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger, leveraging Kokkos, HPX, and SIMD for portable performance at scale in complex, massively parallel adaptive multi-physics simulations. Octo-Tiger supports diverse processors, accelerators, and network backends. Experiments demonstrate exceptional scalability across several heterogeneous supercomputers including Perlmutter, Frontier, and Fugaku, encompassing major GPU architectures and x86, ARM, and RISC-V CPUs. Parallel efficiency of 47.59% (110,080 cores and 6880 hybrid A100 GPUs) on a full-system run on Perlmutter (26% HPCG peak performance) and 51.37% (using 32,768 cores and 2,048 MI250X) on Frontier are achieved.
翻译:动态自适应网格细化在高分辨率、多物理场、多模型模拟中至关重要,它需要在广阔计算域内的局部区域实现精确的物理分辨率。当前超级计算机的极端异构性对动态自适应代码提出了重大挑战,凸显了在大规模计算中实现性能可移植性的重要性。本研究聚焦于天体物理模拟,特别是恒星合并过程,以阐明早期宇宙的动力学行为。我们提出Octo-Tiger框架,该框架利用Kokkos、HPX和SIMD技术,在复杂的大规模并行自适应多物理场模拟中实现可扩展的性能可移植性。Octo-Tiger支持多种处理器、加速器和网络后端。实验表明,该框架在包括Perlmutter、Frontier和Fugaku在内的多个异构超级计算机上展现出卓越的可扩展性,涵盖主流GPU架构以及x86、ARM和RISC-V CPU。在Perlmutter全系统运行中(使用110,080个CPU核心和6880块混合A100 GPU)实现了47.59%的并行效率(达到26%的HPCG峰值性能),在Frontier上(使用32,768个核心和2,048块MI250X加速卡)实现了51.37%的并行效率。