Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU platforms, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to study if the performance of these models is consistently good across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we attempt to answer the question -- to what extent does each programming model provide performance portability for heterogeneous systems in real-world usage?
翻译:可移植性对于在节点内硬件架构多样性增加时确保科学软件的开发与维护的高生产率至关重要。尽管多种编程模型为不同的 GPU 平台提供了可移植性,但它们并未对性能可移植性做出任何保证。在本工作中,我们探索了多种编程模型——CUDA、HIP、Kokkos、RAJA、OpenMP、OpenACC 和 SYCL,以研究这些模型在 NVIDIA 和 AMD GPU 上的性能是否始终良好。我们使用了来自不同科学领域的五个代理应用,在缺少实现时创建了相应实现,并利用它们对编程模型进行了全面的比较评估。我们提供了一种基于 Spack 脚本的方法论,以确保本工作中进行的实验具有可重复性。最后,我们试图回答这样一个问题——在实际使用中,每种编程模型在多大程度上为异构系统提供了性能可移植性?