Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
翻译:主成分分析(PCA)是许多数据科学领域中用于降维的核心算法。我们解决了在多个数据提供者之间分布的私有数据上执行联邦PCA并确保数据机密性的问题。我们的解决方案SF-PCA是一个端到端的安全系统,在被动攻击者模型下(最多允许除一方外的所有参与方共谋),能够保护原始数据和所有中间结果的机密性。SF-PCA联合利用多方同态加密、交互式协议和边缘计算,高效地将本地明文数据计算与集体加密数据操作交错执行。无论数据在各方之间如何分布,SF-PCA均能获得与非安全集中式方案同等精度的结果。其计算复杂度与数据集维度及数据提供者数量呈线性或更优关系。与通过组合本地分析结果来近似求解的现有方法相比,SF-PCA具有更高精度;与仅依赖安全多方计算或同态加密的隐私保护替代方案相比,其速度提升3倍至250倍。我们的工作展示了在私有分布式数据集上实现安全联邦PCA的实际可行性。