Scalable and Privacy-Preserving Federated Principal Component Analysis

David Froelicher,Hyunghoon Cho,Manaswitha Edupalli,Joao Sa Sousa,Jean-Philippe Bossuat,Apostolos Pyrgelis,Juan R. Troncoso-Pastoriza,Bonnie Berger,Jean-Pierre Hubaux

from arxiv, Published elsewhere. IEEE Symposium on Security and Privacy 2023

Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.

翻译：主成分分析（PCA）是许多数据科学领域中用于降维的核心算法。我们解决了在多个数据提供者之间分布的私有数据上执行联邦PCA并确保数据机密性的问题。我们的解决方案SF-PCA是一个端到端的安全系统，在被动攻击者模型下（最多允许除一方外的所有参与方共谋），能够保护原始数据和所有中间结果的机密性。SF-PCA联合利用多方同态加密、交互式协议和边缘计算，高效地将本地明文数据计算与集体加密数据操作交错执行。无论数据在各方之间如何分布，SF-PCA均能获得与非安全集中式方案同等精度的结果。其计算复杂度与数据集维度及数据提供者数量呈线性或更优关系。与通过组合本地分析结果来近似求解的现有方法相比，SF-PCA具有更高精度；与仅依赖安全多方计算或同态加密的隐私保护替代方案相比，其速度提升3倍至250倍。我们的工作展示了在私有分布式数据集上实现安全联邦PCA的实际可行性。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【Manning新书】隐私保护的机器学习，323页pdf

专知会员服务

56+阅读 · 2022年11月4日

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

专知会员服务

27+阅读 · 2022年4月6日

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

专知会员服务

17+阅读 · 2022年3月19日