This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-orthogonal multiple access (NOMA) under non-independent and identically distributed (non-IID) datasets, where multiple devices participate in the aggregation with time limitations and a finite number of sub-channels. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Following that, solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties. Specifically, users' data distributions are parameterized as concentration parameters and grouped using spectral clustering, with Dirichlet distribution serving as the prior. The investigation into the generalization gap and convergence rate guides the design of sub-channel assignments through the matching-based algorithm, and the power allocation is achieved by Karush-Kuhn-Tucker (KKT) conditions with the derived closed-form solution. The extensive simulation results show that the proposed cluster-based FL framework can outperform FL baselines in terms of both test accuracy and convergence rate. Moreover, jointly optimizing sub-channel and power allocation in NOMA-enhanced networks can lead to a significant improvement.
翻译:本研究探讨了在非独立同分布(non-IID)数据集下,将新型聚类联邦学习(CFL)方法与非正交多址接入(NOMA)相结合的益处,其中多个设备在时间限制和有限子信道数量下参与聚合。本文对衡量数据分布非独立同分布程度的泛化差距进行了详细的理论分析。随后,结合性质分析,提出了应对非独立同分布条件挑战的解决方案。具体而言,用户的数据分布被参数化为浓度参数,并通过谱聚类进行分组,以狄利克雷分布作为先验。对泛化差距和收敛速度的研究指导了基于匹配算法的子信道分配设计,而功率分配则通过卡鲁什-库恩-塔克(KKT)条件及推导出的闭式解实现。大量仿真结果表明,所提出的基于聚类的联邦学习框架在测试准确率和收敛速度方面均优于联邦学习基线方法。此外,在NOMA增强网络中联合优化子信道和功率分配可带来显著性能提升。