Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum approximation can achieve $1.15 \textrm{--}4\times$ speed up in convergence compared to existing asynchronous FL optimizers with momentum.
翻译:异步协议已被证明能够提升拥有大量客户端联邦学习(FL)的可扩展性。与此同时,基于动量的方法在同步FL中可实现最佳模型质量。然而,将动量直接应用于异步FL算法会导致收敛速度变慢和模型性能下降。如何有效结合这两种技术以实现双赢目前仍不明确。在本文中,我们发现异步性会为动量更新引入隐式偏差。为了解决该问题,我们提出动量近似方法,通过寻找所有历史模型更新的最优加权平均来最小化偏差。动量近似与安全聚合及差分隐私兼容,并可轻松集成到实际FL系统中,仅需少量通信和存储开销。实验证明,在标准FL数据集上,与现有带动量的异步FL优化器相比,动量近似可实现$1.15 \textrm{--}4\times$的收敛速度提升。