Knowledge Distillation and Training Balance for Heterogeneous Decentralized Multi-Modal Learning over Wireless Networks

Decentralized learning is widely employed for collaboratively training models using distributed data over wireless networks. Existing decentralized learning methods primarily focus on training single-modal networks. For the decentralized multi-modal learning (DMML), the modality heterogeneity and the non-independent and non-identically distributed (non-IID) data across devices make it difficult for the training model to capture the correlated features across different modalities. Moreover, modality competition can result in training imbalance among different modalities, which can significantly impact the performance of DMML. To improve the training performance in the presence of non-IID data and modality heterogeneity, we propose a novel DMML with knowledge distillation (DMML-KD) framework, which decomposes the extracted feature into the modality-common and the modality-specific components. In the proposed DMML-KD, a generator is applied to learn the global conditional distribution of the modality-common features, thereby guiding the modality-common features of different devices towards the same distribution. Meanwhile, we propose to decrease the number of local iterations for the modalities with fast training speed in DMML-KD to address the imbalanced training. We design a balance metric based on the parameter variation to evaluate the training speed of different modalities in DMML-KD. Using this metric, we optimize the number of local iterations for different modalities on each device under the constraint of remaining energy on devices. Experimental results demonstrate that the proposed DMML-KD with training balance can effectively improve the training performance of DMML.

翻译：去中心化学习广泛应用于通过无线网络使用分布式数据协同训练模型。现有的去中心化学习方法主要集中于训练单模态网络。对于去中心化多模态学习（DMML），模态异构性以及跨设备间的非独立同分布（non-IID）数据，使得训练模型难以捕获不同模态之间的相关特征。此外，模态竞争可能导致不同模态间的训练不平衡，进而严重影响DMML的性能。为提升在非IID数据和模态异构条件下的训练性能，我们提出了一种结合知识蒸馏的新型DMML框架（DMML-KD），该框架将提取的特征分解为模态共有特征和模态特有特征。在所提出的DMML-KD中，采用生成器学习模态共有特征的全局条件分布，从而引导不同设备的模态共有特征趋向同一分布。同时，我们提出在DMML-KD中减少训练速度较快模态的本地迭代次数，以解决训练不平衡问题。我们设计了一种基于参数变化的平衡度量指标，用于评估DMML-KD中不同模态的训练速度。利用该度量，我们在设备剩余能量约束下，优化每个设备上不同模态的本地迭代次数。实验结果表明，所提出的带训练平衡的DMML-KD能有效提升DMML的训练性能。