Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators face challenges related to large on-chip memory and area. Additionally, FHE algorithms undergo rapid development, rendering the previous accelerator designs less perfectly adapted to the evolving landscape of optimized FHE applications. In this paper, we conducted a detailed analysis of existing applications with the new FHE method, making two key observations: 1) the bottleneck of FHE applications shifts from NTT to the inner-product operation, and 2) the optimal {\alpha} of KeySwitch changes with the decrease in multiplicative level. Based on these observations, we designed an accelerator named Taiyi, which includes specific hardware for the inner-product operation and optimizes the NTT and BConv operations through algorithmic derivation. A comparative evaluation of Taiyi against previous state-of-the-art designs reveals an average performance improvement of 1.5x and reduces the area overhead by 15.7%.
翻译:全同态加密(FHE)作为一种能在密文数据上直接进行计算的新型密码学理论,虽然提供了显著的安全优势,却因巨大的性能开销而受阻。近年来,一系列加速器设计大幅提升了FHE应用的性能,使其更接近实际应用场景。然而,这些加速器面临片上存储和面积开销较大的挑战。此外,FHE算法发展迅速,导致先前的加速器设计与不断优化的FHE应用演进格局适配性不佳。本文通过对现有应用与新FHE方法的详细分析,得出两个关键发现:1)FHE应用的瓶颈从NTT运算转向内积运算;2)KeySwitch的最优参数α随乘法层级的降低而变化。基于这些发现,我们设计了名为Taiyi的加速器,该加速器包含内积运算的专用硬件,并通过算法推导优化了NTT和BConv运算。将Taiyi与先前最先进的设计进行对比评估,结果显示其平均性能提升1.5倍,面积开销降低15.7%。