Fully Homomorphic Encryption is a technique that allows computation on encrypted data. It has the potential to change privacy considerations in the cloud, but computational and memory overheads are preventing its adoption. TFHE is a promising Torus-based FHE scheme that relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT is the first hardware accelerator to exploit the inherent noise present in FHE calculations. Instead of double or single-precision floating-point arithmetic, it implements TFHE bootstrapping entirely with approximate fixed-point arithmetic. Using an in-depth analysis of noise propagation in bootstrapping FFT computations, FPT is able to use noise-trimmed fixed-point representations that are up to 50% smaller than prior implementations. FPT is built as a streaming processor inspired by traditional streaming DSPs: it instantiates directly cascaded high-throughput computational stages, with minimal control logic and routing networks. We explore throughput-balanced compositions of streaming kernels with a user-configurable streaming width in order to construct a full bootstrapping pipeline. Our approach allows 100% utilization of arithmetic units and requires only a small bootstrapping key cache, enabling an entirely compute-bound bootstrapping throughput of 1 BS / 35us. This is in stark contrast to the classical CPU approach to FHE bootstrapping acceleration, which is typically constrained by memory and bandwidth. FPT is implemented and evaluated as a bootstrapping FPGA kernel for an Alveo U280 datacenter accelerator card. FPT achieves two to three orders of magnitude higher bootstrapping throughput than existing CPU-based implementations, and 2.5x higher throughput compared to recent ASIC emulation experiments.
翻译:摘要:全同态加密是一种允许对加密数据进行计算的技术,有潜力改变云计算的隐私考量,但计算和内存开销阻碍了其广泛应用。TFHE是一种基于环面且有前景的全同态加密方案,其依赖自举(即在每次加密逻辑/算术运算后调用的噪声消除机制)。我们提出FPT,一种用于TFHE自举的定点FPGA加速器。FPT是首个利用全同态加密计算中固有噪声的硬件加速器。它摒弃双精度或单精度浮点运算,完全采用近似定点算术实现TFHE自举。通过对自举FFT计算中噪声传播的深入分析,FPT能够采用噪声裁剪后的定点表示,其位宽较先前实现缩小达50%。FPT采用受传统流式DSP启发的流式处理器架构:直接级联高通量计算阶段,仅需极少的控制逻辑和路由网络。我们探索了用户可配置流式宽度下流式内核的通量平衡组合,以构建完整的自举流水线。该方法可实现算术单元100%利用率,仅需少量自举密钥缓存,最终达成完全计算受限的自举吞吐量——1次自举/35微秒。这与经典CPU加速全同态加密自举的方案形成鲜明对比,后者通常受限于内存和带宽。FPT已在Alveo U280数据中心加速卡上实现并评估为自举FPGA内核。相较现有基于CPU的实现,FPT实现2~3个数量级的自举吞吐量提升;与近期ASIC仿真实验相比,吞吐量提升2.5倍。