Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
翻译:摘要:数据隐私与长尾分布是许多实际场景中的普遍现象而非例外。本文研究了联邦长尾学习(Fed-LT)任务,其中每个客户端持有本地异构数据集;若将数据集全局聚合,则它们共同呈现长尾分布。在此设定下,现有联邦优化和/或集中式长尾学习方法难以适用,原因在于:(a)在隐私约束下刻画全局长尾分布存在挑战,(b)需调整本地学习策略以应对头尾类别不平衡。为此,我们提出了一种名为$\texttt{Fed-GraB}$的方法,包含一个自适应梯度平衡器(SGB)模块,该模块基于由直接先验分析器(DPA)模块评估的全局长尾分布反馈,以闭环方式重新加权客户端梯度。通过$\texttt{Fed-GraB}$,客户端能有效缓解模型训练过程中因数据异构性导致的分布偏移,在保持多数类性能的同时,获得在少数类上表现更优的全局模型。大量实验表明,$\texttt{Fed-GraB}$在CIFAR-10-LT、CIFAR-100-LT、ImageNet-LT和iNaturalist等代表性数据集上达到了最先进的性能。