Fully homomorphic encryption (FHE) enables computations on encrypted data without decryption, offering strong data privacy at the expense of substantial computational and memory overheads. Prior efforts have steadily improved FHE performance through cryptographic and algorithmic enhancements or hardware acceleration, yet these two directions have progressed largely in isolation, hindering the full exploitation of available hardware capabilities. This work presents WHET, which introduces memory-centric, architecture-aware optimizations to better align cryptographic and algorithmic constructions with FHE accelerator architectures. We identify conventional FHE constructions as major sources of excessive working sets and heavy off-chip memory traffic. We propose accelerator-specific techniques, including fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising, to reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads. With these techniques applied, we observe additional opportunities to improve on-chip memory efficiency; hence, we introduce lightweight architectural refinements, including a special-purpose buffer and functional unit extensions. With these optimizations, WHET achieves 1.38-8.74$\times$ per-area performance improvements over state-of-the-art FHE accelerators and the first-ever sub-millisecond CKKS bootstrapping.
翻译:全同态加密(FHE)支持在不解密的情况下对加密数据执行计算,能在提供强数据隐私保护的同时,带来显著的计算和存储开销。此前的研究通过密码学与算法优化或硬件加速逐步提升了FHE性能,但这两个方向很大程度上独立推进,未能充分利用硬件潜能。本文提出WHET框架,从内存中心视角引入架构感知优化,使密码学与算法构造能更好地适配FHE加速器架构。我们发现传统FHE构造是导致工作集过大和片外存储流量激增的主要根源,进而提出密钥系数-槽变换、明文压缩及中间模数提升等加速器专用技术,通过最小化临时密文和明文加载量来减小片上数据足迹。应用这些技术后,我们观察到进一步优化片上存储效率的机会,因此引入轻量级架构改进,包括专用缓冲区和功能单元扩展。通过上述优化,WHET在单位面积性能上比现有最先进的FHE加速器提升1.38-8.74倍,并首次实现小于1毫秒的CKKS自举操作。