Federated Learning (FL) enables the utilization of vast, previously inaccessible data sources. At the same time, pre-trained Language Models (LMs) have taken the world by storm and for good reason. They exhibit remarkable emergent abilities and are readily adapted to downstream tasks. This opens one of the most exciting frontiers in FL: fine-tuning LMs. Yet, a persistent challenge in FL is the frequent, rigid communication of parameters -- a problem magnified by the sheer size of these contemporary models. The FedOpt family of algorithms has become the go-to approach for FL, relying on fixed but arbitrary intervals for model exchanges. Recently, the FDA algorithm prescribed a dynamic approach by monitoring the training progress. However, it introduced a hard-to-calibrate parameter and imposed a rigid synchronization scheme. In this work, we address these limitations by proposing the FDA-Opt family of algorithms -- a unified generalization of both FDA and FedOpt. Our experimental evaluation focuses on fine-tuning LMs on downstream NLP tasks and demonstrates that FDA-Opt outperforms FedOpt even when it is configured with hyper-parameters specifically optimized for the latter. In other words, we show that FDA-Opt is a practical, drop-in replacement for FedOpt in modern FL libraries and systems: it requires no additional configuration and delivers superior performance out of the box.
翻译:联邦学习(FL)使得对先前难以获取的庞大数据源加以利用成为可能。与此同时,预训练语言模型(LM)凭借其展现出的卓越涌现能力以及易于适配下游任务的特性,在学界与工业界引发了巨大反响。这开启了联邦学习领域最具前景的前沿方向之一:语言模型的微调。然而,联邦学习中的一个持续性挑战在于参数频繁且固定的通信——这一问题因当代模型的庞大尺寸而进一步放大。FedOpt算法系列已成为联邦学习的主流方法,其依赖固定但任意的模型交换间隔。近期,FDA算法通过监控训练进度提出了一种动态方法,但该算法引入了难以校准的参数并施加了严格的同步机制。在本工作中,我们通过提出FDA-Opt算法系列来解决上述局限——该系列是FDA与FedOpt的统一泛化形式。我们的实验评估聚焦于在下游自然语言处理任务中对语言模型进行微调,结果表明,即便FedOpt配置了为其专门优化的超参数,FDA-Opt仍能优于FedOpt。换言之,我们证明FDA-Opt可作为现代联邦学习库与系统中FedOpt的即插即用替代方案:它不仅无需额外配置,还能开箱即用地实现更优性能。