Federated Semi-Supervised Learning (FSSL) aims to collaboratively train a global model across clients by leveraging partially-annotated local data in a privacy-preserving manner. In FSSL, data heterogeneity is a challenging issue, which exists both across clients and within clients. External heterogeneity refers to the data distribution discrepancy across different clients, while internal heterogeneity represents the mismatch between labeled and unlabeled data within clients. Most FSSL methods typically design fixed or dynamic parameter aggregation strategies to collect client knowledge on the server (external) and / or filter out low-confidence unlabeled samples to reduce mistakes in local client (internal). But, the former is hard to precisely fit the ideal global distribution via direct weights, and the latter results in fewer data participation into FL training. To this end, we propose a proxy-guided framework called ProxyFL that focuses on simultaneously mitigating external and internal heterogeneity via a unified proxy. I.e., we consider the learnable weights of classifier as proxy to simulate the category distribution both locally and globally. For external, we explicitly optimize global proxy against outliers instead of direct weights; for internal, we re-include the discarded samples into training by a positive-negative proxy pool to mitigate the impact of potentially-incorrect pseudo-labels. Insight experiments & theoretical analysis show our significant performance and convergence in FSSL.
翻译:联邦半监督学习(FSSL)旨在以隐私保护的方式,通过利用部分标注的本地数据,跨客户端协作训练一个全局模型。在FSSL中,数据异构性是一个具有挑战性的问题,它既存在于客户端之间,也存在于客户端内部。外部异构性指的是不同客户端之间的数据分布差异,而内部异构性则表示客户端内部标注数据与未标注数据之间的不匹配。大多数FSSL方法通常设计固定或动态的参数聚合策略,以在服务器端(外部)收集客户端知识,和/或过滤掉低置信度的未标注样本以减少本地客户端(内部)的错误。但是,前者难以通过直接权重精确拟合理想的全局分布,而后者导致参与联邦学习训练的数据量减少。为此,我们提出了一种名为ProxyFL的代理引导框架,该框架专注于通过统一的代理同时缓解外部和内部异构性。即,我们将分类器的可学习权重视为代理,以模拟局部和全局的类别分布。对于外部异构性,我们显式地针对异常值优化全局代理,而非直接调整权重;对于内部异构性,我们通过一个正负代理池将丢弃的样本重新纳入训练,以减轻潜在错误伪标签的影响。深入的实验与理论分析表明,我们的方法在FSSL中具有显著的性能和收敛性。