FPGAs are increasingly gaining traction in cloud and edge computing environments due to their hardware flexibility, low latency, and low energy consumption. However, the existing hardware stack of FPGA and the host-FPGA connectivity does not allow flexible scaling and simultaneous reconfiguration of multiple devices, which limits the adoption of FPGA at scale. In this paper, we present SAF -- an Ethernet-based scalable acceleration framework that allows FPGA to be hot-plugged into a network in a stand-alone fashion without connecting to a local host CPU, which enables flexible scalability. SAF provides a custom FPGA shell and a set of Ethernet protocols that allow FPGAs to connect with a remote host to accelerate application kernels. SAF can configure multiple FPGAs simultaneously, which significantly reduces the reconfiguration time in scaling effort. We implemented the SAF framework using Intel FPGA SDK for OpenCL and 20 Bittware 385A cards with Arria-10 FPGAs. We analyze a case study and conduct experiments to compare SAF with state-of-the-art multi-FPGA clusters. Results show that SAF provides 13X faster reconfiguration than sequential PCIe programming, reduces the hardware setup costs by 38%, application runtime by 25%, and energy consumption by 27%. We evaluated the performance scalability of SAF using the PTRANS benchmark of the HPCC FPGA benchmark suite and showed an almost linear speedup for strong and weak scaling scenarios.
翻译:FPGA因其硬件灵活性、低延迟和低能耗,在云计算和边缘计算环境中日益受到关注。然而,现有FPGA硬件栈及主机-FPGA连接方式无法实现多设备的灵活扩展与同步重配置,限制了FPGA的大规模应用。本文提出SAF——一种基于以太网的可扩展加速框架,允许FPGA以独立方式热插拔接入网络而无需连接本地主机CPU,从而实现灵活扩展。SAF提供定制化FPGA Shell及一套以太网协议,使FPGA能够连接远程主机以加速应用内核。该框架支持同步配置多块FPGA,显著降低了扩展过程中的重配置时间。我们基于Intel FPGA SDK for OpenCL平台,采用20张搭载Arria-10 FPGA的Bittware 385A加速卡实现了SAF框架。通过案例分析与实验,将SAF与先进的多FPGA集群进行对比。结果表明:SAF相比顺序PCIe编程实现13倍重配置加速,硬件部署成本降低38%,应用运行时间减少25%,能耗下降27%。使用HPCC FPGA基准测试套件的PTRANS基准评估SAF性能可扩展性,在强扩展与弱扩展场景下均呈现近似线性的加速比。