Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other. However, it suffers the leakage of private information from uploading models. In addition, as the model size grows, the training latency increases due to limited transmission bandwidth and the model performance degrades while using differential privacy (DP) protection. In this paper, we propose a gradient sparsification empowered FL framework over wireless channels, in order to improve training efficiency without sacrificing convergence performance. Specifically, we first design a random sparsification algorithm to retain a fraction of the gradient elements in each client's local training, thereby mitigating the performance degradation induced by DP and and reducing the number of transmission parameters over wireless channels. Then, we analyze the convergence bound of the proposed algorithm, by modeling a non-convex FL problem. Next, we formulate a time-sequential stochastic optimization problem for minimizing the developed convergence bound, under the constraints of transmit power, the average transmitting delay, as well as the client's DP requirement. Utilizing the Lyapunov drift-plus-penalty framework, we develop an analytical solution to the optimization problem. Extensive experiments have been implemented on three real life datasets to demonstrate the effectiveness of our proposed algorithm. We show that our proposed algorithms can fully exploit the interworking between communication and computation to outperform the baselines, i.e., random scheduling, round robin and delay-minimization algorithms.
翻译:联邦学习使分布式客户端能够在不共享原始数据的情况下协作训练机器学习模型。然而,上传模型参数会引发隐私信息泄露的风险。此外,随着模型规模增大,有限传输带宽导致训练延迟增加,而采用差分隐私保护时模型性能会下降。本文提出一种基于梯度稀疏化的无线信道联邦学习框架,在不牺牲收敛性能的前提下提升训练效率。具体而言,我们首先设计随机稀疏化算法,在客户端本地训练中保留部分梯度元素,从而缓解差分隐私带来的性能退化,并减少无线信道上的传输参数数量。随后,通过构建非凸联邦学习问题模型,分析所提算法的收敛界。接着,在发射功率、平均传输时延及客户端差分隐私要求的约束下,构建用于最小化收敛界的时间序列随机优化问题。利用李雅普诺夫漂移加惩罚框架,推导出该优化问题的解析解。在三个真实数据集上的大量实验验证了所提算法的有效性。结果表明,我们提出的算法能够充分挖掘通信与计算的协同作用,在性能上优于随机调度、轮询和最小化延迟等基线算法。