Principal Component Analysis (PCA) is a pivotal technique in the fields of machine learning and data analysis. In this study, we present a novel approach for privacy-preserving PCA using an approximate numerical arithmetic homomorphic encryption scheme. We build our method upon a proposed PCA routine known as the PowerMethod, which takes the covariance matrix as input and produces an approximate eigenvector corresponding to the first principal component of the dataset. Our method surpasses previous approaches (e.g., Pandas CSCML 21) in terms of efficiency, accuracy, and scalability. To achieve such efficiency and accuracy, we have implemented the following optimizations: (i) We optimized a homomorphic matrix multiplication technique (Jiang et al. SIGSAC 2018) that will play a crucial role in the computation of the covariance matrix. (ii) We devised an efficient homomorphic circuit for computing the covariance matrix homomorphically. (iii) We designed a novel and efficient homomorphic circuit for the PowerMethod that incorporates a systematic strategy for homomorphic vector normalization enhancing both its accuracy and practicality. Our matrix multiplication optimization reduces the minimum rotation key space required for a $128\times 128$ homomorphic matrix multiplication by up to 64\%, enabling more extensive parallel computation of multiple matrix multiplication instances. Our homomorphic covariance matrix computation method manages to compute the covariance matrix of the MNIST dataset ($60000\times 256$) in 51 minutes. Our privacy-preserving PCA scheme based on our new homomorphic PowerMethod circuit successfully computes the top 8 principal components of datasets such as MNIST and Fashion-MNIST in approximately 1 hour, achieving an r2 accuracy of 0.7 to 0.9, achieving an average speed improvement of over 4 times and offers higher accuracy compared to previous approaches.
翻译:主成分分析(PCA)是机器学习和数据分析领域的关键技术。本研究提出了一种基于近似数值算术同态加密方案的隐私保护PCA新方法。我们以名为PowerMethod的PCA例程为基础构建该方法,该例程以协方差矩阵为输入,输出对应数据集第一主成分的近似特征向量。我们的方法在效率、准确性和可扩展性方面均优于先前方案(例如Pandas CSCML 21)。为实现效率与精度的提升,我们实施了以下优化措施:(i)优化了一种同态矩阵乘法技术(Jiang等,SIGSAC 2018),该技术在协方差矩阵计算中发挥关键作用;(ii)设计了高效的协方差矩阵同态计算电路;(iii)构造了新颖高效的PowerMethod同态电路,该电路采用系统化的同态向量归一化策略,显著提升准确性与实用性。我们的矩阵乘法优化将$128\times 128$同态矩阵乘法所需的最小旋转密钥空间减少64%,从而支持多个矩阵乘法实例的并行计算。同态协方差矩阵计算方法可于51分钟内完成MNIST数据集($60000\times 256$)的协方差矩阵运算。基于新型同态PowerMethod电路的隐私保护PCA方案成功在约1小时内计算出MNIST和Fashion-MNIST等数据集的前8个主成分,r²精度达到0.7至0.9,平均速度提升4倍以上,且较先前方案具有更高精度。