Homogenizing Non-IID datasets via In-Distribution Knowledge Distillation for Decentralized Learning

Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. This allows for the use of large datasets, as well as the ability to train with a wide variety of data sources. However, one of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced because all the public dataset samples are used irrespective of their similarity to the local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Finally, only labels corresponding to these subsets are exchanged among the nodes and with appropriate label averaging each node is finetuned on these data subsets along with its local data. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.

翻译：去中心化学习能够在多个节点上以分布式方式实现深度神经网络的无服务器训练，从而支持大规模数据集的使用以及多样化数据源的训练能力。然而，去中心化学习面临的关键挑战之一是各节点间数据分布的非同质性。本文提出分布内知识蒸馏（IDKD）方法来解决数据分布异质性问题，其核心目标是实现节点间数据分布的同质化。虽然通过节点间数据交换可达成数据同质化，但这会牺牲隐私性，而IDKD通过跨节点共享公共数据集实现了相同目标，且不违反隐私约束。该公共数据集与训练数据集不同，用于从各节点蒸馏知识，并通过生成的标签传递给相邻节点。传统知识蒸馏会因使用所有公共数据集样本（无论其与本地数据集的相似性）而降低蒸馏模型的泛化能力。为此，我们在每个节点引入分布外（OoD）检测器，标记公共数据集中与本地训练数据分布映射相近的子集。最终，仅交换这些子集对应的标签，通过适当的标签平均化处理，各节点在保留本地数据的同时对子集数据进行微调。在多个图像分类数据集和图拓扑结构上的实验表明，所提出的IDKD方案比传统知识蒸馏更有效，能以最小通信开销在异质性分布数据上实现最先进的泛化性能。