Fully Homomorphic Encryption (FHE) has emerged as a promising technology for processing encrypted data without the need for decryption. Despite its potential, its practical implementation has faced challenges due to substantial computational overhead. To address this issue, we propose the $first$ chiplet-based FHE accelerator design `REED', which enables scalability and offers high throughput, thereby enhancing homomorphic encryption deployment in real-world scenarios. It incorporates well-known wafer yield issues during fabrication which significantly impacts production costs. In contrast to state-of-the-art approaches, we also address data exchange overhead by proposing a non-blocking inter-chiplet communication strategy. We incorporate novel pipelined Number Theoretic Transform and automorphism techniques, leveraging parallelism and providing high throughput. Experimental results demonstrate that REED 2.5D integrated circuit consumes 177 mm$^2$ chip area, 82.5 W average power in 7nm technology, and achieves an impressive speedup of up to 5,982$\times$ compared to a CPU (24-core 2$\times$Intel X5690), and 2$\times$ better energy efficiency and 50\% lower development cost than state-of-the-art ASIC accelerator. To evaluate its practical impact, we are the $first$ to benchmark an encrypted deep neural network training. Overall, this work successfully enhances the practicality and deployability of fully homomorphic encryption in real-world scenarios.
翻译:全同态加密(FHE)已成为一种无需解密即可处理加密数据的前沿技术。尽管潜力巨大,但其实际部署因显著的计算开销而面临挑战。为解决该问题,我们提出首个基于芯粒的FHE加速器设计"REED",该设计支持可扩展性并实现高吞吐量,从而增强同态加密在真实场景中的部署能力。该设计在制造过程中考虑了众所周知的晶圆良率问题,这对生产成本具有重大影响。与现有最优方法相比,我们还通过提出非阻塞跨芯粒通信策略,解决了数据交换开销问题。我们融合了创新的流水线数论变换与自同构技术,充分利用并行性并提供高吞吐量。实验结果表明,采用7nm工艺的REED 2.5D集成电路占用177 mm²芯片面积,平均功耗为82.5 W,相较于CPU(24核2×Intel X5690)实现了高达5,982倍的加速,相比现有最优ASIC加速器能效提升2倍,开发成本降低50%。为评估其实际影响,我们首次对加密深度神经网络训练进行基准测试。总体而言,本工作成功提升了全同态加密在真实场景中的实用性与可部署性。