Eigenspace estimation is fundamental in machine learning and statistics, which has found applications in PCA, dimension reduction, and clustering, among others. The modern machine learning community usually assumes that data come from and belong to different organizations. The low communication power and the possible privacy breaches of data make the computation of eigenspace challenging. To address these challenges, we propose a class of algorithms called \textsf{FedPower} within the federated learning (FL) framework. \textsf{FedPower} leverages the well-known power method by alternating multiple local power iterations and a global aggregation step, thus improving communication efficiency. In the aggregation, we propose to weight each local eigenvector matrix with {\it Orthogonal Procrustes Transformation} (OPT) for better alignment. To ensure strong privacy protection, we add Gaussian noise in each iteration by adopting the notion of \emph{differential privacy} (DP). We provide convergence bounds for \textsf{FedPower} that are composed of different interpretable terms corresponding to the effects of Gaussian noise, parallelization, and random sampling of local machines. Additionally, we conduct experiments to demonstrate the effectiveness of our proposed algorithms.
翻译:摘要:特征空间估计是机器学习和统计学中的基础问题,广泛应用于主成分分析、降维和聚类等领域。现代机器学习社区通常假设数据来源于不同的组织,并由其所有。低通信能力与数据潜在的隐私泄露问题使得特征空间的计算充满挑战。为解决这些挑战,我们在联邦学习框架内提出一类名为\textsf{FedPower}的算法。\textsf{FedPower}通过交替进行多次局部幂迭代与全局聚合步骤,充分利用了广为人知的幂方法,从而提升了通信效率。在聚合过程中,我们提出使用{\it 正交Procrustes变换}对每个局部特征向量矩阵进行加权,以实现更好的对齐。为确保强大的隐私保护,我们采用\emph{差分隐私}的概念,在每次迭代中添加高斯噪声。我们提供了\textsf{FedPower}的收敛界,该收敛界由与高斯噪声、并行化及局部机器随机采样效应相对应的不同可解释项组成。此外,我们通过实验证明了所提算法的有效性。