The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL's performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis covers single-GPU, multi-GPU, single-CPU, and CPU-GPU hybrid setups, using two common, bioinformatic applications as a case study. The results demonstrate SYCL's versatility across different architectures, maintaining comparable performance to CUDA on NVIDIA GPUs while achieving similar architectural efficiency rates on AMD and Intel GPUs in the majority of cases tested. SYCL also demonstrated remarkable versatility and effectiveness across CPUs from various manufacturers, including the latest hybrid architectures from Intel. Although SYCL showed excellent functional portability in hybrid CPU-GPU configurations, performance varied significantly based on specific hardware combinations. Some performance limitations were identified in multi-GPU and CPU-GPU configurations, primarily attributed to workload distribution strategies rather than SYCL-specific constraints. These findings position SYCL as a promising unified programming model for heterogeneous computing environments, particularly for bioinformatic applications.
翻译:高性能计算(HPC)领域正在经历快速变革,对能效与异构计算环境的关注日益增强。本研究在先前关于SYCL性能可移植性工作的基础上,通过评估其在更广泛计算架构(包括来自NVIDIA、Intel和AMD的CPU、GPU及CPU-GPU混合配置)上的有效性,进行了系统性扩展。我们以两种常见的生物信息学应用为案例,分析了单GPU、多GPU、单CPU及CPU-GPU混合配置下的表现。结果表明,SYCL在不同架构间展现出良好的通用性:在NVIDIA GPU上保持了与CUDA相当的性能,同时在多数测试场景下,在AMD与Intel GPU上实现了相近的架构效率。SYCL在不同厂商(包括Intel最新混合架构)的CPU上也表现出卓越的适应性与效能。尽管SYCL在CPU-GPU混合配置中展现出优秀的功能可移植性,但其性能表现因具体硬件组合差异而波动显著。研究在多GPU及CPU-GPU配置中发现了一些性能局限,这些局限主要归因于工作负载分配策略,而非SYCL自身的限制。这些发现表明,SYCL有望成为异构计算环境(尤其是生物信息学应用领域)中具有前景的统一编程模型。