Continual Federated Learning (CFL) combines Federated Learning (FL), the decentralized learning of a central model on a number of client devices that may not communicate their data, and Continual Learning (CL), the learning of a model from a continual stream of data without keeping the entire history. In CL, the main challenge is \textit{forgetting} what was learned from past data. While replay-based algorithms that keep a small pool of past training data are effective to reduce forgetting, only simple replay sample selection strategies have been applied to CFL in prior work, and no previous work has explored coordination among clients for better sample selection. To bridge this gap, we adapt a replay sample selection objective based on loss gradient diversity to CFL and propose a new relaxation-based selection of samples to optimize the objective. Next, we propose a practical algorithm to coordinate gradient-based replay sample selection across clients without communicating private data. We benchmark our coordinated and uncoordinated replay sample selection algorithms against random sampling-based baselines with language models trained on a large scale de-identified real-world text dataset. We show that gradient-based sample selection methods both boost performance and reduce forgetting compared to random sampling methods, with our coordination method showing gains early in the low replay size regime (when the budget for storing past data is small).
翻译:持续联邦学习(CFL)结合了联邦学习(FL)——在多个不共享数据的客户端设备上分散学习中心模型——与持续学习(CL)——从持续数据流中学习模型而无需保留全部历史数据。在CL中,主要挑战是对过去学习内容的“遗忘”。尽管采用保留少量历史训练数据池的回放算法可有效减轻遗忘,但先前工作中仅将简单的回放样本选择策略应用于CFL,尚无研究探索客户端间协调实现更优样本选择。为弥补这一空白,我们将基于损失梯度多样性的回放样本选择目标适配至CFL,并提出一种新的基于松弛的样本选择方法来优化该目标。进一步地,我们提出一种实用算法,用于在客户端间协调基于梯度的回放样本选择,同时避免通信私有数据。我们以在大规模脱敏真实文本数据集上训练的语言模型为基准,将协调式与非协调式回放样本选择算法与基于随机采样的基线方法进行对比。结果表明:相比随机采样方法,基于梯度的样本选择方法既能提升性能又能减少遗忘,而我们的协调方法在低回放规模(即存储历史数据的预算较小时)的早期阶段表现出显著优势。