Privacy-preserving regression in machine learning is a crucial area of research, aimed at enabling the use of powerful machine learning techniques while protecting individuals' privacy. In this paper, we implement privacy-preserving regression training using data encrypted under a fully homomorphic encryption scheme. We first examine the common linear regression algorithm and propose a (simplified) fixed Hessian for linear regression training, which can be applied for any datasets even not normalized into the range $[0, 1]$. We also generalize this constant Hessian matrix to the ridge regression version, namely linear regression which includes a regularization term to penalize large coefficients. However, our main contribution is to develop a novel and efficient algorithm called LFFR for homomorphic regression using the logistic function, which could model more complex relations between input values and output prediction in comparison with linear regression. We also find a constant simplified Hessian to train our LFFR algorithm using the Newton-like method and compare it against to with our new fixed Hessian linear regression training over two real-world datasets. We suggest normalizing not only the data but also the target predictions even for the original linear regression used in a privacy-preserving manner, which is helpful to remain weights in a small range, say $[-5, +5]$ good for refreshing ciphertext setting parameters, and avoid tuning the regularization parameter $\lambda$ via cross validation. The linear regression with normalized predictions could be a viable alternative to ridge regression.
翻译:机器学习中的隐私保护回归是一个关键的研究领域,旨在实现强大机器学习技术应用的同时保护个体隐私。本文通过使用全同态加密方案加密的数据,实现了隐私保护的回归训练。我们首先分析了常见的线性回归算法,并提出了一种(简化版)适用于线性回归训练的固定Hessian矩阵,该矩阵可应用于任何数据集,即使数据未归一化至$[0, 1]$范围内。我们还将该常数Hessian矩阵推广至岭回归版本,即包含正则化项以惩罚大系数的线性回归。然而,我们的主要贡献是开发了一种新颖高效的算法LFFR,该算法利用逻辑函数进行同态回归,相较于线性回归能够建模输入值与输出预测间更复杂的关系。我们还找到了一个常数简化Hessian矩阵,用于通过类牛顿法训练LFFR算法,并在两个真实数据集上将其与我们新提出的固定Hessian线性回归训练进行了比较。我们建议不仅对数据进行归一化,还对目标预测值进行归一化——即使对于以隐私保护方式使用的原始线性回归也是如此。这有助于将权重保持在较小范围内(例如$[-5, +5]$),有利于刷新密文设置参数,并避免通过交叉验证调整正则化参数$\lambda$。采用归一化预测的线性回归可作为岭回归的有效替代方案。