Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, image compression, denoising, and classification, with autoencoders, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). We implement and deploy our networks atop a framework called HElayers, which shows a 10-35% improvement in inference speed and a 17-35% decrease in memory requirement over the unpruned network, corresponding to 33-65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.
翻译:隐私保护机器学习(PPML)解决方案正日益普及。其中许多方案依赖于同态加密(HE),该技术能保障模型与数据的机密性,但代价是较高的延迟和内存开销。在明文机器学习中,剪枝神经网络(NN)参数可改善延迟和内存占用,但若直接应用于基于HE的PPML则收效甚微。本文提出名为HE-PEx的框架,该框架在名为"张量平铺"的打包技术基础上,集成新型剪枝方法以降低PPML推理的延迟与内存消耗。HE-PEx通过置换操作剪除冗余密文,并利用扩展机制恢复推理精度损失。我们通过自编码器、多层感知机(MLP)和卷积神经网络(CNN),在图像压缩、去噪和分类等PPML任务中,验证了所提方法对全连接层与卷积层剪枝的有效性。基于HElayers框架的实现部署表明:在推理精度损失不超过2.5%的前提下,相比未剪枝网络,推理速度提升10-35%,内存需求降低17-35%,对应密文数量减少33-65%。与现有PPML最优剪枝技术相比,在相同精度损失限制下,本方法平均可减少70%的密文数量。