Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.
翻译:知识蒸馏(KD)能够使具有不同模型架构且不共享本地数据和模型参数的分布式客户端进行协同学习。每个客户端使用所有客户端模型输出的平均模型输出/特征作为目标来更新其本地模型,这被称为联邦知识蒸馏。然而,当客户端的本地模型使用异构的本地数据集进行训练时,现有的联邦知识蒸馏方法通常表现不佳。在本文中,我们提出了一种基于对抗学习的联邦知识蒸馏方法(FedAL),以解决客户端之间的数据异构性问题。首先,为了缓解由数据异构性引起的客户端间本地模型输出差异,服务器充当判别器,通过客户端与判别器之间的最小-最大博弈,指导客户端的本地模型训练,以实现客户端间共识模型输出。此外,由于客户端的异构本地数据,在客户端的本地训练和全局知识转移过程中可能会发生灾难性遗忘。针对这一挑战,我们为本地训练和全局知识转移设计了减少遗忘的正则化方法,以保证客户端向他人转移/从他人学习知识的能力。实验结果表明,FedAL及其变体比其他联邦知识蒸馏基线方法实现了更高的准确率。