Large machine learning models trained on diverse data have recently seen unprecedented success. Federated learning enables training on private data that may otherwise be inaccessible, such as domain-specific datasets decentralized across many clients. However, federated learning can be difficult to scale to large models when clients have limited resources. This challenge often results in a trade-off between model size and access to diverse data. To mitigate this issue and facilitate training of large models on edge devices, we introduce a simple yet effective strategy, Federated Layer-wise Learning, to simultaneously reduce per-client memory, computation, and communication costs. Clients train just a single layer each round, reducing resource costs considerably with minimal performance degradation. We also introduce Federated Depth Dropout, a complementary technique that randomly drops frozen layers during training, to further reduce resource usage. Coupling these two techniques enables us to effectively train significantly larger models on edge devices. Specifically, we reduce training memory usage by 5x or more in federated self-supervised representation learning and demonstrate that performance in downstream tasks is comparable to conventional federated self-supervised learning.
翻译:近年来,基于多样化数据训练的大型机器学习模型取得了前所未有的成功。联邦学习使得在原本可能无法访问的私有数据(例如分散在多个客户端的领域特定数据集)上进行训练成为可能。然而,当客户端的资源有限时,联邦学习难以扩展到大型模型。这一挑战通常导致模型规模与多样化数据访问之间的权衡。为解决这一问题并促进在边缘设备上训练大型模型,我们提出了一种简单而有效的策略——联邦逐层学习,以同时减少每个客户端的存储、计算和通信成本。每轮训练中,客户端仅训练一个层,从而大幅降低资源消耗,同时性能损失极小。我们还引入了联邦深度丢弃——一种在训练过程中随机丢弃冻结层的补充技术,以进一步降低资源使用。这两种技术的结合使我们能够在边缘设备上有效训练规模显著更大的模型。具体而言,在联邦自监督表示学习中,我们将训练存储使用量降低了5倍或更多,并证明在下游任务上的性能与传统联邦自监督学习相当。