Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB.
翻译:联邦半监督学习(FSSL)利用客户端上的标记和未标记数据进行协作模型训练。在FSSL中,异构数据可能向模型引入预测偏差,导致模型的预测偏向某些特定类别。现有的FSSL方法主要通过增强模型参数或输出的一致性来解决此问题。然而,由于模型本身存在偏差,仅约束其一致性不足以缓解预测偏差。本文从贝叶斯视角探究这一偏差,并证明其主要源于训练数据中的标签先验偏差。基于这一洞见,我们提出了一种名为FedDB的FSSL去偏方法。FedDB利用未标记数据的平均预测概率(APP-U)来近似有偏的先验分布。在本地训练阶段,FedDB通过贝叶斯定理利用APP-U优化伪标签生成,从而显著降低标签先验偏差。同时,在模型聚合阶段,FedDB使用参与客户端的APP-U构建无偏的聚合权重,从而有效减少全局模型中的偏差。实验结果表明,FedDB能够超越现有的FSSL方法。代码发布于https://github.com/GuogangZhu/FedDB。