As the interest in FPGA-based accelerators for HPC applications increases, new challenges also arise, especially concerning different programming and portability issues. This paper aims to provide a snapshot of the current state of the FPGA tooling and its problems. To do so, we evaluate the performance portability of two frameworks for developing FPGA solutions for HPC (SYCL and OpenCL) when using them to port a highly-parallel application to FPGAs, using both ND-range and single-task type of kernels. The developer's general recommendation when using FPGAs is to develop single-task kernels for them, as they are commonly regarded as more suited for such hardware. However, we discovered that, when using high-level approaches such as OpenCL and SYCL to program a highly-parallel application with no FPGA-tailored optimizations, ND-range kernels significantly outperform single-task codes. Specifically, while SYCL struggles to produce efficient FPGA implementations of applications described as single-task codes, its performance excels with ND-range kernels, a result that was unexpectedly favorable.
翻译:随着基于FPGA的高性能计算应用加速器日益受到关注,新的挑战也随之浮现,尤其在编程范式与可移植性方面。本文旨在对当前FPGA工具链的现状及其存在问题进行阶段性总结。为此,我们评估了两种面向高性能计算FPGA解决方案的开发框架(SYCL与OpenCL)在移植高度并行化应用至FPGA时的性能可移植性,其中同时采用了ND-range与单任务两种内核类型。开发者在FPGA编程中通常建议采用单任务内核,因其普遍被认为更适配此类硬件架构。然而本研究发现,当使用OpenCL与SYCL这类高级编程方法来实现未经FPGA专项优化的高度并行化应用时,ND-range内核的性能显著优于单任务代码。具体而言,虽然SYCL在实现单任务代码描述的应用时难以生成高效的FPGA实施方案,但其在ND-range内核上表现出卓越性能,这一结果呈现出意料之外的优越性。