Privacy-preserving regression in machine learning is a crucial area of research, aimed at enabling the use of powerful machine learning techniques while protecting individuals' privacy. In this paper, we implement privacy-preserving regression training using data encrypted under a fully homomorphic encryption scheme. We first examine the common linear regression algorithm and propose a (simplified) fixed Hessian for linear regression training, which can be applied for any datasets even not normalized into the range $[0, 1]$. We also generalize this constant Hessian matrix to the ridge regression version, namely linear regression which includes a regularization term to penalize large coefficients. However, our main contribution is to develop a novel and efficient algorithm called LFFR for homomorphic regression using the logistic function, which could model more complex relations between input values and output prediction in comparison with linear regression. We also find a constant simplified Hessian to train our LFFR algorithm using the Newton-like method and compare it against to with our new fixed Hessian linear regression training over two real-world datasets. We suggest normalizing not only the data but also the target predictions even for the original linear regression used in a privacy-preserving manner, which is helpful to remain weights in a small range, say $[-5, +5]$ good for refreshing ciphertext setting parameters, and avoid tuning the regularization parameter $\lambda$ via cross validation. The linear regression with normalized predictions could be a viable alternative to ridge regression.
翻译:机器学习中的隐私保护回归是一个关键的研究领域,旨在实现强大机器学习技术应用的同时保护个体隐私。本文采用全同态加密方案对加密数据实现隐私保护的回归训练。我们首先分析了常见的线性回归算法,并提出了一种(简化的)固定Hessian矩阵用于线性回归训练,该方法可适用于任何数据集,即使未归一化至$[0, 1]$区间。我们还将此常数Hessian矩阵推广至岭回归版本,即包含正则化项以惩罚大系数的线性回归。然而,我们的主要贡献是开发了一种新颖高效的LFFR算法,利用逻辑函数实现同态回归,与线性回归相比能够建模输入值与输出预测之间更复杂的关系。我们还找到了一个简化的常数Hessian矩阵,用于通过类牛顿法训练LFFR算法,并在两个真实数据集上将其与我们提出的固定Hessian线性回归训练进行对比。我们建议不仅对数据归一化,同时对目标预测值也进行归一化——即使对于原始线性回归在隐私保护场景下的应用,这也有助于将权重保持在较小范围(例如$[-5, +5]$),有利于刷新密文设置参数,并避免通过交叉验证调整正则化参数$\lambda$。采用归一化预测的线性回归可作为岭回归的有效替代方案。