Cyclical Weight Consolidation: Towards Solving Catastrophic Forgetting in Serial Federated Learning

Federated Learning (FL) has gained attention for addressing data scarcity and privacy concerns. While parallel FL algorithms like FedAvg exhibit remarkable performance, they face challenges in scenarios with diverse network speeds and concerns about centralized control, especially in multi-institutional collaborations like the medical domain. Serial FL presents an alternative solution, circumventing these challenges by transferring model updates serially between devices in a cyclical manner. Nevertheless, it is deemed inferior to parallel FL in that (1) its performance shows undesirable fluctuations, and (2) it converges to a lower plateau, particularly when dealing with non-IID data. The observed phenomenon is attributed to catastrophic forgetting due to knowledge loss from previous sites. In this paper, to overcome fluctuation and low efficiency in the iterative learning and forgetting process, we introduce cyclical weight consolidation (CWC), a straightforward yet potent approach specifically tailored for serial FL. CWC employs a consolidation matrix to regulate local optimization. This matrix tracks the significance of each parameter on the overall federation throughout the entire training trajectory, preventing abrupt changes in significant weights. During revisitation, to maintain adaptability, old memory undergoes decay to incorporate new information. Our comprehensive evaluations demonstrate that in various non-IID settings, CWC mitigates the fluctuation behavior of the original serial FL approach and enhances the converged performance consistently and significantly. The improved performance is either comparable to or better than the parallel vanilla.

翻译：联邦学习（FL）因解决数据稀缺性和隐私问题而受到关注。虽然FedAvg等并行FL算法表现出显著性能，但在网络速度差异较大及对集中控制存在顾虑的场景下面临挑战，尤其在医学领域等多机构协作中。串行FL提供了一种替代方案，通过以循环方式在设备间串行传输模型更新来规避这些挑战。然而，它被认为不如并行FL，原因在于：(1)其性能表现出不良波动，(2)尤其在处理非独立同分布数据时收敛于较低水平。这一现象被归因于先前站点知识丢失导致的灾难性遗忘。本文为克服迭代学习与遗忘过程中的波动和低效率问题，引入了循环权重巩固（CWC）——一种专为串行FL设计的简洁而有效的方法。CWC通过巩固矩阵调节局部优化，该矩阵追踪整个训练过程中每个参数对全局联邦的重要性，从而防止重要权重的突变。在重访过程中，为保持适应性，旧记忆会进行衰减以融入新信息。我们的全面评估表明，在各种非独立同分布设置下，CWC缓解了原始串行FL方法的波动行为，并持续且显著地提升了收敛性能。改进后的性能可媲美或优于原始并行方法。