Gaussian process regression (GPR) is a non-parametric model that has been used in many real-world applications that involve sensitive personal data (e.g., healthcare, finance, etc.) from multiple data owners. To fully and securely exploit the value of different data sources, this paper proposes a privacy-preserving GPR method based on secret sharing (SS), a secure multi-party computation (SMPC) technique. In contrast to existing studies that protect the data privacy of GPR via homomorphic encryption, differential privacy, or federated learning, our proposed method is more practical and can be used to preserve the data privacy of both the model inputs and outputs for various data-sharing scenarios (e.g., horizontally/vertically-partitioned data). However, it is non-trivial to directly apply SS on the conventional GPR algorithm, as it includes some operations whose accuracy and/or efficiency have not been well-enhanced in the current SMPC protocol. To address this issue, we derive a new SS-based exponentiation operation through the idea of 'confusion-correction' and construct an SS-based matrix inversion algorithm based on Cholesky decomposition. More importantly, we theoretically analyze the communication cost and the security of the proposed SS-based operations. Empirical results show that our proposed method can achieve reasonable accuracy and efficiency under the premise of preserving data privacy.
翻译:高斯过程回归(GPR)是一种非参数模型,已被广泛应用于涉及多个数据拥有者敏感个人数据(如医疗、金融等)的真实场景中。为充分且安全地挖掘多源数据的价值,本文提出了一种基于秘密共享(SS)的隐私保护GPR方法,秘密共享是一种安全多方计算(SMPC)技术。与现有通过同态加密、差分隐私或联邦学习保护GPR数据隐私的研究不同,我们的方法更具实用性,能够保护模型输入和输出的数据隐私,适用于多种数据共享场景(如水平/垂直分割数据)。然而,直接将SS应用于传统GPR算法并非易事,因为该算法包含一些在当前SMPC协议中精度和/或效率尚未得到良好提升的操作。为解决这一问题,我们通过“混淆-校正”思想推导出一种基于SS的指数运算,并基于Cholesky分解构建了基于SS的矩阵求逆算法。更重要的是,我们从理论上分析了所提出SS操作的通信开销和安全性。实验结果表明,在保障数据隐私的前提下,我们的方法能够实现合理的精度和效率。