Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged AMD's past effort such as HCC, which unfortunately is deprecated and does not support newer hardware platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust, Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation. Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our solution using the same set of benchmarks.
翻译:近期,AMD平台未能支持C++17 PSTL(StdPar)程序向GPU的卸载。我们此前的研究表明,StdPar能在NVIDIA和Intel GPU平台上取得良好性能。在那项工作中,我们肯定了AMD早前的HCC方案,但遗憾的是该方案已被弃用且不支持新型硬件平台。AMD、Codeplay和AdaptiveCpp(前身为hipSYCL或OpenSYCL)的最新进展为StdPar程序在AMD GPU上运行提供了多条路径。本非正式报告探讨了当前AMD GPU可用StdPar实现方案的使用体验与评估。我们采用包含多异构编程模型移植版本的高性能计算小型应用测试集进行基准测试,并将所有可用StdPar编译器的性能与AMD GPU支持的当代异构编程模型(包括HIP、OpenCL、Thrust、Kokkos、OpenMP、SYCL)进行对比。在适当部分,我们讨论了评估过程中遇到的问题及采取的变通方案。最后,本报告讨论的StdPar模型在很大程度上依赖于统一共享内存(USM)性能,而目前仅有极少数AMD GPU完全支持该特性。因此,本报告针对采用HIP API的模型展示了基于主机端用户空间页错误的概念验证解决方案,并通过相同基准测试集论证了该方案带来的性能提升。