Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks across three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch. Code and models are available at https://github.com/Westlake-AI/SemiReward.
翻译:半监督学习(SSL)在基于伪标签的自训练框架中取得了显著进展,其核心挑战在于如何区分高质量伪标签以对抗确认偏差。然而,现有伪标签选择策略局限于预定义方案或专为分类任务设计的复杂手工规则,难以同时实现高质量标签获取、快速收敛及任务泛化性。为此,我们提出半监督奖励框架(SemiReward),通过预测奖励分数评估并筛选高质量伪标签,该框架可即插即用于主流SSL方法,适用于多种任务类型与场景。为缓解确认偏差,SemiReward采用生成器模型与子采样策略进行两阶段在线训练。在涵盖三种模态的13个标准SSL基准数据集上的分类与回归实验中,SemiReward在Pseudo Label、FlexMatch及Free/SoftMatch方法基础上均实现了显著的性能提升与更快的收敛速度。代码与模型已开源至https://github.com/Westlake-AI/SemiReward。