Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they suffer from practical problems associated with large-scale chip design, such as inflexibility, low yield, and high manufacturing cost. In this paper, we present the first-of-its-kind multi-chiplet-based FHE accelerator `REED' for overcoming the limitations of prior monolithic designs. To utilize the advantages of multi-chiplet structures while matching the performance of larger monolithic systems, we propose and implement several novel strategies in the context of FHE. These include a scalable chiplet design approach, an effective framework for workload distribution, a custom inter-chiplet communication strategy, and advanced pipelined Number Theoretic Transform and automorphism design to enhance performance. Experimental results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$^2$ chip area, 49.4 W average power in 7nm technology. It could achieve a remarkable speedup of up to 2,991x compared to a CPU (24-core 2xIntel X5690) and offer 1.9x better performance, along with a 50% reduction in development costs when compared to state-of-the-art ASIC FHE accelerators. Furthermore, our work presents the first instance of benchmarking an encrypted deep neural network (DNN) training. Overall, the REED architecture design offers a highly effective solution for accelerating FHE, thereby significantly advancing the practicality and deployability of FHE in real-world applications.
翻译:全同态加密(FHE)能够实现隐私保护计算,具有广泛的应用前景。然而,其实际部署面临巨大的计算和存储开销。为应对这一瓶颈,业界已提出多款专用集成电路(ASIC)FHE加速器。这些先前工作均将所有FHE所需组件集成于单一芯片(单芯片设计),从而提供高性能。然而,它们受限于大规模芯片设计的实际问题,如灵活性不足、良率低及制造成本高。本文首次提出基于多芯粒的FHE加速器"REED",以突破先前单芯片设计的局限。为充分利用多芯粒结构优势并匹配大型单芯片系统的性能,我们针对FHE场景提出并实现了多项创新策略,包括:可扩展的芯粒设计方法、高效的工作负载分配框架、定制化芯粒间通信策略,以及先进的流水线数论变换与自同构设计以提升性能。实验结果表明,采用7nm工艺的REED 2.5D微处理器芯片面积为96.7 mm²,平均功耗为49.4 W。与CPU(24核2xIntel X5690)相比,可实现高达2,991倍的速度提升;与最先进ASIC FHE加速器相比,性能提升1.9倍,开发成本降低50%。此外,本文首次实现了加密深度神经网络(DNN)训练的性能基准测试。总体而言,REED架构设计为加速FHE提供了高效解决方案,显著推动了FHE在实际应用中的实用性与可部署性。