Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Federated Object Detection

Federated Learning (FL) has emerged as a potent framework for training models across distributed data sources while maintaining data privacy. Nevertheless, it faces challenges with limited high-quality labels and non-IID client data, particularly in applications like autonomous driving. To address these hurdles, we navigate the uncharted waters of Semi-Supervised Federated Object Detection (SSFOD). We present a pioneering SSFOD framework, designed for scenarios where labeled data reside only at the server while clients possess unlabeled data. Notably, our method represents the inaugural implementation of SSFOD for clients with 0% labeled non-IID data, a stark contrast to previous studies that maintain some subset of labels at each client. We propose FedSTO, a two-stage strategy encompassing Selective Training followed by Orthogonally enhanced full-parameter training, to effectively address data shift (e.g. weather conditions) between server and clients. Our contributions include selectively refining the backbone of the detector to avert overfitting, orthogonality regularization to boost representation divergence, and local EMA-driven pseudo label assignment to yield high-quality pseudo labels. Extensive validation on prominent autonomous driving datasets (BDD100K, Cityscapes, and SODA10M) attests to the efficacy of our approach, demonstrating state-of-the-art results. Remarkably, FedSTO, using just 20-30% of labels, performs nearly as well as fully-supervised centralized training methods.

翻译：联邦学习（FL）已成为一种能够跨分布式数据源训练模型同时保护数据隐私的强大框架。然而，它在面临高质量标签稀缺和非独立同分布客户端数据时存在挑战，尤其在自动驾驶等应用场景中。为应对这些难题，我们探索了半监督联邦目标检测（SSFOD）这一尚未充分研究的领域。我们提出了一种开创性的SSFOD框架，专为标签数据仅存在于服务器端而客户端持有无标签数据的场景设计。值得注意的是，我们的方法首次实现了针对客户端拥有0%带标签的非独立同分布数据的SSFOD方案，这与以往研究中每个客户端保留部分标签子集的做法截然不同。我们提出了FedSTO，一种包含选择性训练和正交增强全参数训练的两阶段策略，以有效处理服务器与客户端之间的数据偏移（例如天气条件）。我们的贡献包括：选择性精炼检测器主干网络以防止过拟合、采用正交正则化提升表示多样性、以及利用局部指数移动平均驱动的伪标签分配生成高质量伪标签。在代表性自动驾驶数据集（BDD100K、Cityscapes和SODA10M）上的广泛验证证明了我们方法的有效性，并展示了最先进的结果。值得注意的是，仅使用20-30%标签的FedSTO，其性能几乎与全监督集中式训练方法相当。