In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the L1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.
翻译:在快速发展的机器学习领域中,算法有效性常因数据质量与可用性而受限。传统方法因法律和隐私问题难以实现数据共享。联邦学习框架为解决这一挑战提供了方案。联邦学习是一种去中心化的方法,模型训练在客户端进行,通过保持数据本地化来保护隐私。原始数据无需发送至中央服务器,仅交换模型更新,从而增强数据安全性。本文将该框架应用于稀疏主成分分析(SPCA)。SPCA的目标是在最大化数据方差的同时获得稀疏成分载荷,以提高可解释性。除传统SPCA中的L1范数正则化项外,我们引入平滑函数以支持基于梯度的优化方法。此外,为提升计算效率,我们提出一种基于最小二乘近似的原始SPCA改进方法,使优化过程具备解析解,从而显著提升计算性能。在联邦框架下,我们将SPCA建模为共识优化问题,并采用交替方向乘子法(ADMM)求解。我们在不同数据拥有者之间开展了涵盖独立同分布(IID)与非独立同分布(non-IID)随机特征的大量实验。在合成数据集和公开数据集上的结果验证了所提出联邦SPCA方法的有效性。