Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semisupervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabeled data. We then design a consensusbased diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both IID and non-IID SSL scenarios.
翻译:去中心化联邦学习(DFL)通过互联客户端间的协同模型训练,摆脱了对中心服务器的依赖,从而缓解了集中式联邦学习(CFL)中存在的通信瓶颈和单点故障问题。现有DFL研究大多聚焦于监督学习,假设每个客户端都拥有充足的标注数据进行本地训练。然而,在实际应用中,大量数据是未标注的。为此,我们考虑在DFL中引入一个具有挑战性但更符合实际的半监督学习(SSL)场景,其中客户端的数据来源可能各不相同:部分客户端仅有少量标注样本,部分仅有纯未标注数据,而其他客户端则两者兼有。本文提出SemiDFL,这是首个在半监督场景下通过建立数据和模型空间双重共识来提升DFL性能的半监督DFL方法。具体而言,我们利用邻域信息提升伪标注质量,这对有效利用未标注数据至关重要。随后,我们设计了一种基于共识的扩散模型来生成合成数据,并将其与伪标注数据结合构建混合数据集。此外,我们开发了一种自适应聚合方法,利用合成数据的模型精度进一步提升SemiDFL的性能。通过大量实验,我们证明了所提出的SemiDFL方法在独立同分布(IID)与非独立同分布(non-IID)半监督场景下,均显著优于现有的CFL与DFL方案。