Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID). Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from client heterogeneity. In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts). We propose Auxo to gradually identify such cohorts in large-scale, low-availability, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. Our extensive evaluations show that, by identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, Auxo boosts various existing FL solutions in terms of final accuracy (2.1% - 8.2%), convergence time (up to 2.2x), and model bias (4.8% - 53.8%).
翻译:联邦学习是一种新兴的机器学习范式,使异构边缘设备能够在无需向逻辑中心化服务器泄露原始数据的情况下协同训练机器学习模型。然而,除了设备能力的异构性外,联邦学习参与者常表现出数据分布的差异,即数据非独立同分布的特性。现有工作多聚焦于解决由客户端异质性引发的收敛缓慢、最终精度低和模型偏差等具体问题。本文探索了通过聚类具有统计相似数据分布的客户端来缓解异质性的新型复杂度量方案。我们提出Auxo框架,能够在大规模、低可用性和资源受限的联邦学习环境中逐步识别此类客户端群体。进而,Auxo自适应地确定群体专属模型的训练策略,以提升模型性能并保证资源效率。大量实验表明,通过识别异质性较小的群体并执行高效的群体训练,Auxo在最终精度(提升2.1%-8.2%)、收敛速度(加速至2.2倍)和模型偏差(降低4.8%-53.8%)方面显著增强了现有联邦学习解决方案的性能。