The First OpenFOAM HPC Challenge (OHC-1)

The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware and to compare hardware-constrained submissions with software-track optimisations. Participants ran a common incompressible steady-state RANS case, the open-closed cooling DrivAer (occDrivAer) configuration, on prescribed meshes, submitting either with the reference setup (hardware track) or with modified solvers, decomposition strategies, or accelerator offloading (software track). In total, 237 valid datapoints were submitted by 12 contributors: 175 in the hardware track and 62 in the software track. The hardware track covered 25 distinct CPU models across AMD, Intel, and ARM families, with runs spanning from single-node configurations up to 256 nodes (32768 CPU cores). Wall-clock times ranged from 7.8 minutes to 65.7 hours and reported energy-to-solution from 2.1 to 236.9 kWh. Analysis of the hardware track identified a Pareto front of optimal balance between time- and energy-to-solution, and revealed that on-package high-bandwidth memory (HBM) dominates single-node performance for next-generation CPUs. Software-track submissions achieved up to 28% lower energy per iteration, 17% higher maximum performance per node, and 72% shorter minimum time per iteration than the best hardware-track results, with full GPU ports and selective-memory optimisations leading the performance range. This manuscript describes the challenge organisation, the case setup and metrics, and presents the main findings from both tracks together with an outlook for future challenges.

翻译：首届OpenFOAM高性能计算挑战赛（OHC-1）由OpenFOAM高性能计算技术委员会（HPCTC）组织，旨在收集OpenFOAM在当代生产硬件上计算性能的快照，并对比硬件约束提交方案与软件优化方案。参赛者使用预设网格运行标准不可压缩稳态RANS算例——敞开/封闭冷却DrivAer（occDrivAer）构型，可选择提交参考配置方案（硬件赛道）或修改求解器、分解策略及加速器卸载方案（软件赛道）。12位贡献者共提交237个有效数据点：其中硬件赛道175个，软件赛道62个。硬件赛道覆盖AMD、Intel和ARM架构的25种不同CPU型号，运行配置从单节点扩展至256节点（32768个CPU核心）。实测运行时间范围从7.8分钟到65.7小时，能量消耗报告值为2.1至236.9千瓦时。硬件赛道分析确定了时间与能量消耗最优平衡的帕累托前沿，并揭示片上高带宽内存（HBM）是下一代CPU单节点性能主导因素。相较于硬件赛道最优结果，软件赛道提交方案实现每迭代能耗降低28%、单节点最大性能提升17%、单迭代最小时间缩短72%，其中完全GPU端口迁移和选择性内存优化方案处于性能前沿。本文阐述赛事组织架构、算例设置与评价指标，呈现两条赛道的主要发现，并展望未来挑战方向。