Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained Internet of Things (IoT) devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical settings. The key idea behind HIST is a hierarchical version of model partitioning, where we partition the global model into disjoint submodels in each round, and distribute them across different cells, so that each cell is responsible for training only one partition of the full model. This enables each client to save computation/storage costs while alleviating the communication loads throughout the hierarchy. We characterize the convergence behavior of HIST for non-convex loss functions under mild assumptions, showing the impact of several attributes (e.g., number of cells, local and global aggregation frequency) on the performance-efficiency tradeoff. Finally, through numerical experiments, we verify that HIST is able to save communication costs by a wide margin while achieving the same target testing accuracy.
翻译:分层联邦学习(HFL)相较于传统基于“星型拓扑”架构的联邦学习(FL),展示了显著的扩展性优势。然而,HFL 仍给边缘端带来巨大的计算、通信和存储负担,尤其是在资源受限的物联网(IoT)设备上训练大规模模型时。本文提出了一种新的联邦学习方法——分层独立子模型训练(HIST),旨在解决分层设置中的这些问题。HIST 的核心思想是模型划分的分层版本:在每一轮中,将全局模型划分为互不相交的子模型,并将其分配到不同的单元(cell)中,使得每个单元只负责训练完整模型的一部分。这使每个客户端能够节省计算/存储成本,同时减轻整个层次的通信负载。我们在温和假设下刻画了 HIST 针对非凸损失函数的收敛行为,揭示了若干属性(如单元数量、局部和全局聚合频率)对性能-效率权衡的影响。最后,通过数值实验验证了 HIST 在达到相同目标测试精度的同时,能够大幅节省通信成本。