Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead. We observe transforming the DNN weights into circulant matrices converts general matrix-vector multiplications into HE-friendly 1-dimensional convolutions, drastically reducing the HE computation cost. Hence, in this paper, we propose \method, a protocol/network co-optimization framework based on block circulant transformation. At the protocol level, PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation and reduces the computation latency in proportion to the block size. At the network level, we propose a latency-aware formulation to search for the layer-wise block size assignment based on second-order information. PrivCirNet also leverages layer fusion to further reduce the inference cost. We compare PrivCirNet with the state-of-the-art HE-based framework Bolt (IEEE S\&P 2024) and the HE-friendly pruning method SpENCNN (ICML 2023). For ResNet-18 and Vision Transformer (ViT) on Tiny ImageNet, PrivCirNet reduces latency by $5.0\times$ and $1.3\times$ with iso-accuracy over Bolt, respectively, and improves accuracy by $4.1\%$ and $12\%$ over SpENCNN, respectively. For MobileNetV2 on ImageNet, PrivCirNet achieves $1.7\times$ lower latency and $4.2\%$ better accuracy over Bolt and SpENCNN, respectively. Our code and checkpoints are available in the supplementary materials.
翻译:基于同态加密(HE)的深度神经网络(DNN)推理能够保护数据和模型隐私,但存在显著的计算开销。我们观察到,将DNN权重转换为循环矩阵,可将通用矩阵-向量乘法转化为HE友好的一维卷积,从而大幅降低HE计算成本。为此,本文提出PrivCirNet,一个基于分块循环变换的协议/网络协同优化框架。在协议层面,PrivCirNet定制了与分块循环变换完全兼容的HE编码算法,并实现了计算延迟随分块大小成比例降低。在网络层面,我们提出一种基于二阶信息的延迟感知优化公式,以搜索逐层分块大小分配方案。PrivCirNet还利用层融合技术进一步降低推理开销。我们将PrivCirNet与最先进的基于HE的框架Bolt(IEEE S&P 2024)以及HE友好的剪枝方法SpENCNN(ICML 2023)进行比较。在Tiny ImageNet数据集上,对于ResNet-18和Vision Transformer(ViT),PrivCirNet在保持相同精度下分别比Bolt降低了$5.0\times$和$1.3\times$的延迟,同时相比SpENCNN分别提升了$4.1\%$和$12\%$的准确率。在ImageNet数据集上,对于MobileNetV2,PrivCirNet相比Bolt和SpENCNN分别实现了$1.7\times$的延迟降低和$4.2\%$的准确率提升。我们的代码与模型检查点已提供于补充材料中。