eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may be competitive for the kind of heavy numerical computations common in FPGA-based digital signal processing. In this paper we take a deep dive into the performance of the eGPU family on FFT computation, in order to quantify the performance gap between state-of-the-art soft processors and commercial IP cores specialized for this task. In the process, we propose two novel architectural features for the eGPU that improve the efficiency of the design by 50\% when executing the FFTs. The end-result is that our modified GPGPU takes only 3 times the performance-area product of a specialized IP core, yet as a programmable processor is able to execute arbitrary software-defined algorithms. Further comparison to Nvidia A100 GPGPUs demonstrates the superior efficiency of eGPU on FFTs of the size studied (256 to 4096-point).
翻译:eGPU(近期报道的一种面向FPGA的软GPGPU)已展现出极高的时钟频率(超过750 MHz)和极小的芯片面积。这意味着,商用软处理器首次有望在基于FPGA的数字信号处理中常见的高强度数值计算领域具备竞争力。本文深入研究了eGPU系列在FFT计算中的性能表现,以量化当前最先进的软处理器与专用于此任务的商业IP核之间的性能差距。在此过程中,我们为eGPU提出了两种新颖的架构特性,使其在执行FFT时设计效率提升了50%。最终,我们改进后的GPGPU的性能面积乘积仅为专用IP核的3倍,但作为可编程处理器,能够执行任意软件定义的算法。与Nvidia A100 GPGPU的进一步对比表明,在所研究的FFT规模(256至4096点)上,eGPU具有更优的效率。