Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. To address these challenges, we propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme. We evaluate our method using the ResNet models on the CIFAR-10 and ImageNet datasets, achieving several orders of magnitude improvement compared to prior work and reducing the latency of the encrypted CNN inference to 1.4 seconds on an NVIDIA A100 GPU. We also show that the latency drops to a mere 0.03 seconds with a custom hardware design.
翻译:将全同态加密(FHE)融入卷积神经网络(CNN)的推理过程,作为实现隐私推理(PI)的可行途径备受关注。FHE可将完整计算过程委托给服务器,同时确保客户端敏感数据的机密性。然而,CNN中FHE的实际部署面临重大障碍,主要源于FHE巨大的计算与存储开销。针对这些挑战,本文提出了一套优化方案,包括GPU/ASIC加速、高效激活函数及优化的打包方案。我们使用CIFAR-10和ImageNet数据集上的ResNet模型进行评估,相较于现有工作实现了多个数量级的性能提升,并在NVIDIA A100 GPU上将加密CNN推理延迟降至1.4秒。此外,通过定制硬件设计,推理延迟可进一步降低至0.03秒。