Faster Convergence on Heterogeneous Federated Edge Learning: An Adaptive Clustered Data Sharing Approach

Federated Edge Learning (FEEL) emerges as a pioneering distributed machine learning paradigm for the 6G Hyper-Connectivity, harnessing data from the Internet of Things (IoT) devices while upholding data privacy. However, current FEEL algorithms struggle with non-independent and non-identically distributed (non-IID) data, leading to elevated communication costs and compromised model accuracy. To address these statistical imbalances within FEEL, we introduce a clustered data sharing framework, mitigating data heterogeneity by selectively sharing partial data from cluster heads to trusted associates through sidelink-aided multicasting. The collective communication pattern is integral to FEEL training, where both cluster formation and the efficiency of communication and computation impact training latency and accuracy simultaneously. To tackle the strictly coupled data sharing and resource optimization, we decompose the overall optimization problem into the clients clustering and effective data sharing subproblems. Specifically, a distribution-based adaptive clustering algorithm (DACA) is devised basing on three deductive cluster forming conditions, which ensures the maximum sharing yield. Meanwhile, we design a stochastic optimization based joint computed frequency and shared data volume optimization (JFVO) algorithm, determining the optimal resource allocation with an uncertain objective function. The experiments show that the proposed framework facilitates FEEL on non-IID datasets with faster convergence rate and higher model accuracy in a limited communication environment.

翻译：联邦边缘学习（FEEL）作为一种开创性的分布式机器学习范式，在6G超连接时代应运而生，它利用物联网（IoT）设备的数据，同时维护数据隐私。然而，现有的FEEL算法在处理非独立同分布（non-IID）数据时面临挑战，导致通信成本升高和模型精度下降。为解决FEEL中的这种统计不平衡问题，我们提出了一种聚类数据共享框架，通过侧链路辅助的多播，选择性地将部分数据从簇头共享给可信的关联设备，从而缓解数据异构性。集体通信模式是FEEL训练不可或缺的一部分，其中簇的形成以及通信与计算的效率同时影响训练延迟和准确性。为应对数据共享与资源优化之间的紧密耦合问题，我们将整体优化问题分解为客户端聚类和有效数据共享两个子问题。具体而言，基于三个演绎的簇形成条件，我们设计了一种基于分布的自适应聚类算法（DACA），以确保最大的共享收益。同时，我们设计了一种基于随机优化的联合计算频率与共享数据量优化（JFVO）算法，在目标函数不确定的情况下确定最优资源分配。实验表明，所提出的框架在有限通信环境下，能够促进FEEL在非独立同分布数据集上实现更快的收敛速度和更高的模型精度。