Privacy-preserving machine learning is one class of cryptographic methods that aim to analyze private and sensitive data while keeping privacy, such as homomorphic logistic regression training over large encrypted data. In this paper, we propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE), which is the mini-batch version of recent methods using a faster gradient variant called $\texttt{quadratic gradient}$. It is claimed that $\texttt{quadratic gradient}$ can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.
翻译:隐私保护机器学习是一类旨在分析私有和敏感数据同时保护隐私的密码学方法,例如基于大规模加密数据的同态逻辑回归训练。本文提出一种利用同态加密(HE)在大规模加密数据上进行逻辑回归训练的高效算法,该算法是近期采用名为$\texttt{quadratic gradient}$的快速梯度变体方法的迷你批次版本。据称$\texttt{quadratic gradient}$能将曲线信息(海森矩阵)融入梯度,从而有效加速一阶梯度(下降)算法。当加密数据集规模过大而必须采用迷你批次方式加密时,我们还实现了其方法的全批次版本。我们在包含422,108个样本、200个特征的真实金融数据上,将迷你批次算法与全批次实现方法进行了对比。鉴于同态加密的低效性,我们的实验结果具有启发性,证明在大规模加密数据集上进行逻辑回归训练具备实际可行性,这标志着该领域认知的重要里程碑。