Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
翻译:联邦学习(FL)允许机器学习模型在本地移动设备上训练,并通过共享服务器同步模型更新。该方法能保护用户隐私,但由于不同设备的性能差异,也带来了异构训练环境。因此,性能较低的掉队设备往往决定了FL的整体训练时间。本文旨在通过动态平衡系统内的训练负载,缓解掉队者造成的性能瓶颈。我们引入不变性剪枝(Invariant Dropout)方法,该方法根据权重更新阈值提取子模型,从而最小化对准确率的潜在影响。基于该剪枝技术,我们开发了一个自适应训练框架——利用不变性剪枝的联邦学习(FLuID)。FLuID提供轻量级子模型提取来调节计算强度,从而在保证模型质量的前提下减轻掉队设备的负载。我们的方法利用非掉队设备的神经元更新,根据客户端性能分析为每个掉队设备构建定制化子模型。此外,FLuID能够根据运行时条件的变化动态适应掉队者情况。我们在五个真实移动客户端上评估了FLuID。评估结果表明,不变性剪枝在保持基准模型效率的同时,通过动态运行时方法缓解了掉队者的性能瓶颈。