Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present APACHE, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of both computational resources and memory bandwidth. In addition, we propose a multi-scheme operator compiler to efficiently schedule high-level FHE computations across lower-level functional units. In the experiment, we evaluate APACHE on various FHE applications, such as Lola MNIST, HELR, fully-packed bootstrapping, and fully homomorphic processors. The results illustrate that APACHE outperforms the state-of-the-art ASIC FHE accelerators by 2.4x to 19.8x over a variety of operator and application benchmarks.
翻译:全同态加密(FHE)允许在不解密的情况下将加密数据的计算外包给不可信服务器,从而避免数据泄露风险。由于FHE计算强度极高,专用加速器成为缩小性能差距的有效方案。然而,随着FHE方案本身复杂性的提升以及端到端隐私保护任务中多方案FHE算法设计的演进,现有FHE加速器常面临硬件利用率低和内存带宽不足的挑战。本文提出APACHE——一种专为多方案FHE加速设计的层次化近存储计算架构。通过深入分析不同FHE方案间的数据流特征,我们设计了具有细粒度功能单元的层次化近存储计算架构,可显著提升计算资源和内存带宽的利用率。此外,我们提出多方案算子编译器,能在底层功能单元间高效调度高层FHE计算。实验基于Lola MNIST、HELR、全打包自举及全同态处理器等多种FHE应用进行评估。结果表明,在各类算子与应用基准测试中,APACHE的性能较现有最先进ASIC FHE加速器提升2.4倍至19.8倍。