The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized accelerators, vendors and tool developers back up the relentless progress of those architectures. In the context of scientific programming, it is fundamental to consider performance portability frameworks, i.e., software tools that allow programmers to write code once and run it on different computer architectures without sacrificing performance. We report here on the benefits and challenges of performance portability using a field-line tracing simulation and a particle-in-cell code, two relevant applications in computational plasma physics with applications to magnetically-confined nuclear-fusion energy research. For these applications we report performance results obtained on four HPC platforms with server-class CPUs from Intel (Xeon) and AMD (EPYC), and high-end GPUs from Nvidia and AMD, including the latest Nvidia H100 GPU and the novel AMD Instinct MI300A APU. Our results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent "out-of-the-box" performance, even for the very latest hardware platforms. For our applications, Kokkos provided performance portability to the broadest range of hardware architectures from different vendors.
翻译:高性能计算(HPC)领域近期出现了硬件平台及其相关编程模型的显著多样化。从传统的多核处理器到高度专业化的加速器,供应商和工具开发者都在支撑着这些架构的持续进步。在科学编程的背景下,考虑性能可移植性框架至关重要,即允许程序员编写一次代码,即可在不同计算机架构上运行且不牺牲性能的软件工具。本文报告了使用场线追踪模拟和粒子网格法代码这两种在计算等离子体物理中具有重要应用(尤其针对磁约束核聚变能源研究)的程序时,实现性能可移植性所带来的优势与挑战。针对这些应用,我们报告了在四个HPC平台上获得的性能结果,这些平台采用了英特尔(Xeon)和AMD(EPYC)的服务器级CPU,以及Nvidia和AMD的高端GPU,包括最新的Nvidia H100 GPU和新型AMD Instinct MI300A APU。我们的结果表明,Kokkos和OpenMP都是实现性能可移植性和良好“开箱即用”性能的强大工具,即使对于最新的硬件平台也是如此。对于我们的应用而言,Kokkos为来自不同供应商的最广泛硬件架构提供了性能可移植性。