Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, multi-physics, adaptive mesh-based software instrument. We extend this study to compare performance to that of Stony Brook's Ookami testbed which features ARM-based A64FX-700 processors, and SeaWulf's AMD EPYC Milan and Intel Skylake nodes. Our application is a stellar explosion known as a thermonuclear (Type Ia) supernova and for this 3D problem, FLASH includes operators for hydrodynamics, gravity, and nuclear burning, in addition to routines for the material equation of state. We perform a strong-scaling study with a 220 GB problem size to explore both single- and multi-node performance. Our study explores the performance of different MPI mappings and the distribution of processors across nodes. From these tests, we determined the optimal configuration to balance runtime and energy consumption for our application.
翻译:天体物理模拟在计算、内存乃至能耗方面均属密集型任务,因此需要新的硬件进展以推动研究进步。石溪大学近期在其计算集群"SeaWulf"中新增了94个搭载英特尔至强Max系列Sapphire Rapids CPU的节点。我们使用FLASH——一种基于自适应网格的多尺度、多物理场科学计算软件——对该硬件进行了性能与能效研究。本研究进一步将性能与石溪大学搭载ARM架构A64FX-700处理器的Ookami测试平台,以及SeaWulf集群原有的AMD EPYC Milan和英特尔Skylake节点进行对比。我们的应用对象是被称为热核(Ia型)超新星的恒星爆发过程,针对这一三维问题,FLASH除材料状态方程计算模块外,还包含流体动力学、引力和核燃烧等物理算子。我们采用220GB规模的问题进行强扩展性研究,以探索单节点与多节点性能。研究通过测试不同MPI映射方案及处理器在节点间的分布策略,最终确定了平衡运行时间与能耗的最优配置方案。