Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise

We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K}})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $\Omega({\sqrt{k}})$ and $\Omega({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings.

翻译：我们提出了一种改进的收敛性分析技术，可刻画存在非理想/含噪上下行通信的联邦学习分布式学习范式。此类非理想通信场景在联邦学习部署于新兴通信系统与协议时普遍存在。本文所提出的分析首次揭示了：联邦学习中上下行通信的负面影响存在非对称性。具体而言，下行链路噪声对联邦学习算法收敛性的不利影响更为严重。基于这一发现，我们提出了改进的信噪比控制策略——在忽略可忽略的高阶项后，该策略可使联邦学习达到与完美无噪通信信道场景相似的收敛速率，同时相比现有方案可显著降低功率资源消耗。特别地，我们证明：为维持与无噪联邦学习相同的$O(\frac{1}{\sqrt{K}})$收敛速率，需将上行链路和下行链路噪声分别缩减至$\Omega({\sqrt{k}})$和$\Omega({k})$量级，其中$k$表示通信轮次（$k=1,\dots, K$）。我们的理论结果具有两大显著优势：其一，无需依赖客户非相似度有界这一现实性存疑的假设；其二，仅要求损失函数为光滑非凸函数——该类函数更适用于现代机器学习和深度学习模型。我们通过大量实证分析验证了理论结果的有效性。