Privacy-Preserving CNN Training with Transfer Learning: Multiclass Logistic Regression

from arxiv, In this work, we initiated to implement privacy-persevering CNN training based on mere HE techniques by presenting a faster HE-friendly algorithm

In this paper, we present a practical solution to implement privacy-preserving CNN training based on mere Homomorphic Encryption (HE) technique. To our best knowledge, this is the first attempt successfully to crack this nut and no work ever before has achieved this goal. Several techniques combine to accomplish the task:: (1) with transfer learning, privacy-preserving CNN training can be reduced to homomorphic neural network training, or even multiclass logistic regression (MLR) training; (2) via a faster gradient variant called $\texttt{Quadratic Gradient}$, an enhanced gradient method for MLR with a state-of-the-art performance in convergence speed is applied in this work to achieve high performance; (3) we employ the thought of transformation in mathematics to transform approximating Softmax function in the encryption domain to the approximation of the Sigmoid function. A new type of loss function termed $\texttt{Squared Likelihood Error}$ has been developed alongside to align with this change.; and (4) we use a simple but flexible matrix-encoding method named $\texttt{Volley Revolver}$ to manage the data flow in the ciphertexts, which is the key factor to complete the whole homomorphic CNN training. The complete, runnable C++ code to implement our work can be found at: \href{https://github.com/petitioner/HE.CNNtraining}{$\texttt{https://github.com/petitioner/HE.CNNtraining}$}. We select $\texttt{REGNET\_X\_400MF}$ as our pre-trained model for transfer learning. We use the first 128 MNIST training images as training data and the whole MNIST testing dataset as the testing data. The client only needs to upload 6 ciphertexts to the cloud and it takes $\sim 21$ mins to perform 2 iterations on a cloud with 64 vCPUs, resulting in a precision of $21.49\%$.

翻译：本文提出了一种仅基于同态加密技术实现隐私保护卷积神经网络训练的实用方案。据我们所知，这是首次成功解决该难题的尝试，此前未有工作达成此目标。多项技术共同实现了该任务：(1) 借助迁移学习，隐私保护CNN训练可简化为同态神经网络训练，甚至多类逻辑回归训练；(2) 通过名为$\texttt{Quadratic Gradient}$的快速梯度变体——一种在收敛速度上具有先进性能的增强型MLR梯度方法，本工作实现了高效训练；(3) 我们运用数学变换思想，将加密域中Softmax函数的近似问题转换为Sigmoid函数的近似问题。为此专门开发了新型损失函数$\texttt{Squared Likelihood Error}$以适应此变换；(4) 采用名为$\texttt{Volley Revolver}$的简洁灵活矩阵编码方法管理密文数据流，这是完成完整同态CNN训练的关键因素。完整可运行的C++实现代码可见：\href{https://github.com/petitioner/HE.CNNtraining}{$\texttt{https://github.com/petitioner/HE.CNNtraining}$}。我们选择$\texttt{REGNET\_X\_400MF}$作为迁移学习的预训练模型，使用前128张MNIST训练图像作为训练数据，完整MNIST测试集作为测试数据。客户端仅需上传6个密文至云端，在64个虚拟CPU的云服务器上执行2轮迭代耗时约21分钟，最终获得$21.49\%$的准确率。