Ensuring high productivity in scientific software development necessitates developing and maintaining a single codebase that can run efficiently on a range of accelerator-based supercomputing platforms. While prior work has investigated the performance portability of a few selected proxy applications or programming models, this paper provides a comprehensive study of a range of proxy applications implemented in the major programming models suitable for GPU-based platforms. We present and analyze performance results across NVIDIA and AMD GPU hardware currently deployed in leadership-class computing facilities using a representative range of scientific codes and several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL. Based on the specific characteristics of applications tested, we include recommendations to developers on how to choose the right programming model for their code. We find that Kokkos and RAJA in particular offer the most promise empirically as performance portable programming models. These results provide a comprehensive evaluation of the extent to which each programming model for heterogeneous systems provides true performance portability in real-world usage.
翻译:在科学软件开发中确保高生产力,需要开发并维护一个能够在多种基于加速器的超级计算平台上高效运行的单一代码库。尽管已有研究探讨了少数选定代理应用或编程模型的性能可移植性,本文对基于GPU平台的主要编程模型中实现的一系列代理应用进行了全面研究。我们呈现并分析了当前在领先级计算设施中部署的NVIDIA和AMD GPU硬件上的性能结果,使用了具有代表性的科学代码及多种编程模型——CUDA、HIP、Kokkos、RAJA、OpenMP、OpenACC和SYCL。基于所测试应用的特定特征,我们向开发者提供了如何为其代码选择正确编程模型的建议。我们发现,Kokkos和RAJA在经验上尤其呈现出作为性能可移植编程模型的最大潜力。这些结果全面评估了每种异构系统编程模型在实际使用中实现真正性能可移植性的程度。