A common approach of system identification and machine learning is to generate a model by using training data to predict the test data instances as accurate as possible. Nonetheless, concerns about data privacy are increasingly raised, but not always addressed. We present a secure protocol for learning a linear model relying on recently described technique called real number secret sharing. We take as our starting point the PAC Bayesian bounds and deduce a closed form for the model parameters which depends on the data and the prior from the PAC Bayesian bounds. To obtain the model parameters one needs to solve a linear system. However, we consider the situation where several parties hold different data instances and they are not willing to give up the privacy of the data. Hence, we suggest to use real number secret sharing and multiparty computation to share the data and solve the linear regression in a secure way without violating the privacy of data. We suggest two methods; a secure inverse method and a secure Gaussian elimination method, and compare these methods at the end. The benefit of using secret sharing directly on real numbers is reflected in the simplicity of the protocols and the number of rounds needed. However, this comes with the drawback that a share might leak a small amount of information, but in our analysis we argue that the leakage is small.
翻译:系统辨识与机器学习的常用方法是通过使用训练数据生成模型,以尽可能准确地预测测试数据实例。然而,关于数据隐私的担忧日益增加,但并未总能得到解决。我们提出了一种用于学习线性模型的安全协议,该协议依赖于最近描述的称为实数值秘密共享的技术。我们以PAC贝叶斯界为出发点,推导出模型参数的闭式表达式,该表达式依赖于数据及来自PAC贝叶斯界的先验信息。要获得模型参数,需要求解一个线性系统。然而,我们考虑的情况是,多方持有多样化的数据实例,且不愿放弃数据的隐私性。因此,我们建议使用实数值秘密共享和多方计算来共享数据,并以安全的方式求解线性回归,而不会侵犯数据隐私。我们提出了两种方法:一种安全的求逆方法和一种安全的高斯消元方法,并在最后对这些方法进行了比较。直接在实数值上使用秘密共享的好处体现在协议的简单性以及所需的轮次数上。然而,这样做也有缺点,即秘密份额可能泄露少量信息,但在我们的分析中,我们认为这种泄露量很小。