Privacy-preserving nerual network inference has been well studied while homomorphic CNN training still remains an open challenging task. In this paper, we present a practical solution to implement privacy-preserving CNN training based on mere Homomorphic Encryption (HE) technique. To our best knowledge, this is the first attempt successfully to crack this nut and no work ever before has achieved this goal. Several techniques combine to make it done: (1) with transfer learning, privacy-preserving CNN training can be reduced to homomorphic neural network training, or even multiclass logistic regression (MLR) training; (2) via a faster gradient variant called $\texttt{Quadratic Gradient}$, an enhanced gradient method for MLR with a state-of-the-art performance in converge speed is applied in this work to achieve high performance; (3) we employ the thought of transformation in mathematics to transform approximating Softmax function in encryption domain to the well-studied approximation of Sigmoid function. A new type of loss function is alongside been developed to complement this change; and (4) we use a simple but flexible matrix-encoding method named $\texttt{Volley Revolver}$ to manage the data flow in the ciphertexts, which is the key factor to complete the whole homomorphic CNN training. The complete, runnable C++ code to implement our work can be found at: https://github.com/petitioner/HE.CNNtraining. We select $\texttt{REGNET\_X\_400MF}$ as our pre-train model for using transfer learning. We use the first 128 MNIST training images as training data and the whole MNIST testing dataset as the testing data. The client only needs to upload 6 ciphertexts to the cloud and it takes $\sim 21$ mins to perform 2 iterations on a cloud with 64 vCPUs, resulting in a precision of $21.49\%$.
翻译:隐私保护的神经网络推理已得到充分研究,而基于同态加密的CNN训练仍是一项具有挑战性的开放任务。本文提出了一种基于同态加密(HE)技术实现隐私保护CNN训练的实用方案。据我们所知,这是首次成功攻克这一难题的尝试,此前尚无相关工作实现该目标。多项技术的结合使这一突破成为可能:(1)通过迁移学习,可将隐私保护CNN训练简化为同态神经网络训练,甚至简化为多类逻辑回归(MLR)训练;(2)采用名为$\texttt{Quadratic Gradient}$的快速梯度变体——一种在收敛速度上具有最先进性能的MLR增强梯度方法,以实现高性能;(3)运用数学变换思想,将加密域中Softmax函数的近似问题转化为已得到充分研究的Sigmoid函数近似问题,并为此开发了新型损失函数作为补充;(4)采用名为$\texttt{Volley Revolver}$的简单而灵活的矩阵编码方法来管理密文中的数据流,这是完成整个同态CNN训练的关键因素。实现本工作的完整可运行C++代码见:https://github.com/petitioner/HE.CNNtraining。我们选择$\texttt{REGNET\_X\_400MF}$作为迁移学习的预训练模型,使用前128张MNIST训练图像作为训练数据,整个MNIST测试集作为测试数据。客户端仅需上传6个密文至云端,在配备64个vCPU的云服务器上执行2次迭代约需21分钟,最终获得$21.49\%$的精度。