Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. This allows for the use of large datasets, as well as the ability to train with a wide variety of data sources. However, one of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced because all the public dataset samples are used irrespective of their similarity to the local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Finally, only labels corresponding to these subsets are exchanged among the nodes and with appropriate label averaging each node is finetuned on these data subsets along with its local data. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.
翻译:去中心化学习支持在多个节点上以分布式方式实现深度神经网络的无服务器训练,既可利用大规模数据集,也能融合多样化的数据源。然而,去中心化学习的关键挑战之一在于节点间数据分布的异质性。本文提出分布内知识蒸馏(IDKD)方法以应对异构数据分布难题,其目标是通过同质化各节点的数据分布来实现。尽管交换节点数据可达成数据同质化,但这种方法会牺牲隐私性;而IDKD通过跨节点共享公共数据集(非训练数据集,且不违反隐私约束),利用生成的标签从各节点蒸馏知识并传递给相邻节点,从而达成相同目标。传统知识蒸馏因无差别使用公共数据集所有样本(无论其与本地数据集的相似性),导致蒸馏模型的泛化能力下降。为此,我们在每个节点引入分布外(OoD)检测器,从公共数据集中筛选出与本地训练数据分布高度匹配的子集。最终,仅在各节点间交换这些子集对应的标签,并通过标签平均化策略,使每个节点在本地数据与这些数据子集上联合微调。我们在多个图像分类数据集和图拓扑上的实验表明,所提出的IDKD方案比传统知识蒸馏更有效,在异构分布数据上实现了最先进的泛化性能,且通信开销极低。