Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
翻译:空中(OTA)联邦学习(FL)作为一种可扩展的范式已得到广泛认可,它利用无线多址信道的波形叠加特性,在单次使用中聚合模型更新。现有的OTA-FL设计主要通过假设无线条件同质化(各设备路径损耗相等)或强制零偏差更新来确保收敛,从而强制实现零偏差模型更新。然而,在异构无线场景下,此类设计受限于最弱设备,并会放大更新方差。此外,先前关于有偏OTA-FL的分析主要针对凸优化目标,而大多数现代AI模型具有高度非凸性。基于这些研究空白,我们研究了无线异构条件下针对一般光滑非凸目标的随机梯度下降(SGD)OTA-FL。我们开发了新颖的OTA-FL SGD更新方案,允许存在结构化、时不变的模型偏差,同时促进降低方差更新。我们推导出有限时间平稳性界(期望时间平均梯度范数平方),明确揭示了偏差-方差权衡关系。为优化此权衡,我们提出非凸联合OTA功率控制设计,并开发了一种仅需基站统计信道状态信息(CSI)的高效逐次凸逼近(SCA)算法。在非凸图像分类任务上的实验验证了该方法的有效性:基于SCA的设计通过优化偏差加速收敛,并在泛化性能上优于现有OTA-FL基线。