Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.
翻译:知识蒸馏(KD)能够支持具有不同模型架构、且不共享本地数据及模型参数的分布式客户端进行协作学习。每个客户端通过将全体客户端模型的平均输出/特征作为目标来更新本地模型,这被称为联邦KD。然而,当客户端的本地模型在异构本地数据集上训练时,现有联邦KD方法往往表现不佳。本文提出基于对抗学习增强的联邦知识蒸馏方法(FedAL)以解决客户端间的数据异构性问题。首先,为缓解由数据异构性导致的客户端本地模型输出差异,服务器作为判别器通过客户端与判别器间的极小极大博弈,引导客户端本地模型训练,使各客户端达成模型输出共识。此外,由于客户端异构本地数据可能引发本地训练及全局知识迁移中的灾难性遗忘,针对这一挑战,我们设计了面向本地训练与全局知识迁移的低遗忘正则化机制,以保障客户端向其他客户端迁移/学习知识的能力。实验结果表明,FedAL及其变体相比其他联邦KD基线方法取得了更高的准确率。