In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.
翻译:在实际的联邦学习(FL)部署中,参与训练的设备上的数据分布会随时间演变,导致异步数据漂移——不同设备在不同时间点发生漂移,且漂移方向各异。缓解此类漂移极具挑战性:频繁重训练会给资源受限设备带来高昂计算成本,而减少重训练频率则会降低漂移设备的性能。我们提出DriftGuard——一种能够有效适应异步数据漂移的联邦持续学习框架。该框架采用基于混合专家(MoE)的架构,将捕获全局可迁移知识的共享参数与适应特定群体分布的局部参数分离。这一设计支持两种互补的重训练策略:(i)全局重训练——在检测到系统级漂移时更新共享参数;(ii)分组重训练——根据MoE门控模式识别设备集群,在不共享原始数据的前提下选择性更新局部参数。在多个数据集和模型上的实验表明,DriftGuard在匹配或超越现有最优精度的同时,可将总重训练成本降低高达83%。因此,它实现了单位重训练成本下的最高精度,相比最强基线方法提升高达2.3倍。DriftGuard可从https://github.com/blessonvar/DriftGuard获取。