Traditional machine learning techniques require centralizing all training data on one server or data hub. Due to the development of communication technologies and a huge amount of decentralized data on many clients, collaborative machine learning has become the main interest while providing privacy-preserving frameworks. In particular, federated learning (FL) provides such a solution to learn a shared model while keeping training data at local clients. On the other hand, in a wide range of machine learning and signal processing applications, the desired solution naturally has a certain structure that can be framed as sparsity with respect to a certain dictionary. This problem can be formulated as an optimization problem with sparsity constraints and solving it efficiently has been one of the primary research topics in the traditional centralized setting. In this paper, we propose a novel algorithmic framework, federated gradient matching pursuit (FedGradMP), to solve the sparsity constrained minimization problem in the FL setting. We also generalize our algorithms to accommodate various practical FL scenarios when only a subset of clients participate per round, when the local model estimation at clients could be inexact, or when the model parameters are sparse with respect to general dictionaries. Our theoretical analysis shows the linear convergence of the proposed algorithms. A variety of numerical experiments are conducted to demonstrate the great potential of the proposed framework -- fast convergence both in communication rounds and computation time for many important scenarios without sophisticated parameter tuning.
翻译:传统机器学习技术需要将所有训练数据集中到一台服务器或数据中心。随着通信技术的发展以及大量客户端上分散数据的涌现,协作机器学习在提供隐私保护框架的同时已成为主要研究方向。特别地,联邦学习(FL)提供了一种解决方案,能够在保持训练数据位于本地客户端的同时学习共享模型。另一方面,在广泛的机器学习与信号处理应用中,期望解通常具有某种结构,可描述为相对于特定字典的稀疏性。该问题可被建模为带稀疏约束的优化问题,且在传统集中式环境下如何高效求解一直是核心研究课题之一。本文提出一种新颖的算法框架——联邦梯度匹配追踪(FedGradMP),用于求解联邦学习场景下的稀疏约束最小化问题。我们进一步将算法推广至多种实际联邦学习场景,包括每轮仅部分客户端参与、客户端本地模型估计可能不精确、或模型参数相对于一般字典具有稀疏性等情况。理论分析证明了所提算法的线性收敛性。通过大量数值实验验证了该框架的巨大潜力——在无需复杂参数调优的情况下,针对多种重要场景均能实现通信轮次与计算时间上的快速收敛。