Homomorphic encryption (HE) enables computations on encrypted data by concealing information under noise for security. However, the process of bootstrapping, which resets the noise level in the ciphertext, is computationally expensive and requires a large bootstrapping key. The TFHE scheme offers a faster and programmable bootstrapping algorithm called PBS, crucial for security-focused applications like machine learning. Nevertheless, the current TFHE scheme lacks support for ciphertext packing, resulting in low throughput. This work thoroughly analyzes TFHE bootstrapping, identifies the bottleneck in GPUs caused by the blind rotation fragmentation problem, and proposes a hardware TFHE accelerator called Strix. Strix introduces a two-level batching approach to enhance the batch size in PBS, utilizes a specialized microarchitecture for efficient streaming data processing, and incorporates a fully-pipelined FFT microarchitecture to improve performance. It achieves significantly higher throughput than state-of-the-art implementations on both CPUs and GPUs, outperforming existing TFHE accelerators by a factor of 7.4.
翻译:同态加密(HE)通过将信息隐藏在噪声中实现加密数据的计算,从而保障安全性。然而,重置密文噪声水平的自举过程计算成本高昂,且需要大量自举密钥。TFHE方案提供了一种更快速的可编程自举算法(PBS),这对机器学习等安全敏感型应用至关重要。但当前TFHE方案缺乏对密文打包的支持,导致吞吐量较低。本文深入分析TFHE自举机制,揭示了因盲旋转碎片化问题导致的GPU性能瓶颈,并提出名为Strix的硬件TFHE加速器。Strix引入双层级批处理方法以提升PBS中的批处理规模,采用专用微架构实现高效流式数据处理,并集成全流水线FFT微架构以优化性能。与CPU和GPU上最先进的实现方案相比,Strix在吞吐量上实现显著提升,性能较现有TFHE加速器提高达7.4倍。