Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach

Hierarchical federated learning (HFL) has emerged as a key architecture for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication limits, thereby making aggressive gradient compression essential. One-bit methods such as sign-based stochastic gradient descent (SignSGD) offer an attractive solution in flat federated settings, but existing theory and algorithms do not naturally extend to hierarchical settings. In particular, the interaction between majority-vote aggregation at the edge layer and model aggregation at the cloud layer, and its impact on end-to-end performance, remains unknown. To bridge this gap, we propose a highly communication-efficient sign-based HFL framework and develop its corresponding formulation for nonconvex learning, where devices send only signed stochastic gradients, edge servers combine them through majority-vote, and the cloud periodically averages the obtained edge models, while utilizing downlink quantization to broadcast the global model. We introduce the resulting scalable HFL algorithm, HierSignSGD, and provide the convergence analysis for SignSGD in a hierarchical setting. Our core technical contribution is a characterization of how biased sign compression, two-level aggregation intervals, and inter-cluster heterogeneity collectively affect convergence. Numerical experiments under homogeneous and heterogeneous data splits show that HierSignSGD, despite employing extreme compression, achieves accuracy comparable to or better than full-precision stochastic gradient descent while reducing communication cost in the process, and remains robust under aggressive downlink sparsification.

翻译：分层联邦学习已成为大规模无线和物联网系统的关键架构，其中设备在连接至云端前需与邻近的边缘服务器通信。在此类环境中，上行链路带宽与延迟施加了严格的通信限制，因此激进的梯度压缩至关重要。在扁平联邦学习设置中，基于符号的随机梯度下降等单比特方法提供了极具吸引力的解决方案，但现有理论与算法无法直接推广至分层设置。具体而言，边缘层多数投票聚合与云端模型聚合之间的相互作用及其对端到端性能的影响尚不明确。为填补这一空白，我们提出了一种高通信效率的基于符号的分层联邦学习框架，并为其在非凸学习场景下建立了相应形式化模型：设备仅发送带符号的随机梯度，边缘服务器通过多数投票进行聚合，云端则周期性地对获得的边缘模型进行平均，同时利用下行链路量化技术广播全局模型。我们提出了由此衍生的可扩展分层联邦学习算法——HierSignSGD，并给出了SignSGD在分层设置中的收敛性分析。我们的核心理论贡献在于揭示了有偏符号压缩、两级聚合间隔与集群间异构性如何共同影响收敛过程。在同质与异构数据划分下的数值实验表明，尽管采用极端压缩，HierSignSGD在降低通信成本的同时，仍能达到与全精度随机梯度下降相当或更优的精度，并在激进的下行链路稀疏化条件下保持鲁棒性。