In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and aggregated by the aggregator at the end of each round. Part-2 is trained at a server that sequentially processes the intermediate activations received from each client. We study the phenomenon of catastrophic forgetting (CF) in SFL in the presence of data heterogeneity. In detail, due to the nature of SFL, local updates of part-1 may drift away from global optima, while part-2 is sensitive to the processing sequence, similar to forgetting in continual learning (CL). Specifically, we observe that the trained model performs better in classes (labels) seen at the end of the sequence. We investigate this phenomenon with emphasis on key aspects of SFL, such as the processing order at the server and the cut layer. Based on our findings, we propose Hydra, a novel mitigation method inspired by multi-head neural networks and adapted for the SFL setting. Extensive numerical evaluations show that Hydra outperforms baselines and methods from the literature.
翻译:在分割联邦学习(SFL)中,客户端在服务器的协助下通过将模型分割为两部分来协同训练模型。第1部分在每个客户端本地训练,并在每轮结束时由聚合器进行聚合。第2部分在服务器端训练,该服务器顺序处理从各客户端接收的中间激活值。我们研究了在数据异构性存在下SFL中的灾难性遗忘(CF)现象。具体而言,由于SFL的特性,第1部分的本地更新可能偏离全局最优解,而第2部分对处理顺序敏感,类似于持续学习(CL)中的遗忘现象。特别地,我们观察到训练后的模型在序列末尾出现的类别(标签)上表现更佳。我们结合SFL的关键方面(如服务器端的处理顺序和分割层)对这一现象进行了深入研究。基于这些发现,我们提出了Hydra——一种受多头神经网络启发并适配于SFL设置的新型缓解方法。大量数值评估表明,Hydra在性能上优于基线方法和文献中的现有方法。