We propose UnMixMatch, a semi-supervised learning framework which can learn effective representations from unconstrained unlabelled data in order to scale up performance. Most existing semi-supervised methods rely on the assumption that labelled and unlabelled samples are drawn from the same distribution, which limits the potential for improvement through the use of free-living unlabeled data. Consequently, the generalizability and scalability of semi-supervised learning are often hindered by this assumption. Our method aims to overcome these constraints and effectively utilize unconstrained unlabelled data in semi-supervised learning. UnMixMatch consists of three main components: a supervised learner with hard augmentations that provides strong regularization, a contrastive consistency regularizer to learn underlying representations from the unlabelled data, and a self-supervised loss to enhance the representations that are learnt from the unlabelled data. We perform extensive experiments on 4 commonly used datasets and demonstrate superior performance over existing semi-supervised methods with a performance boost of 4.79%. Extensive ablation and sensitivity studies show the effectiveness and impact of each of the proposed components of our method.
翻译:我们提出UnMixMatch,一种能够从无约束无标注数据中学习有效表征以扩展性能的半监督学习框架。现有半监督方法大多依赖"标注样本与无标注样本来自同一分布"的假设,这限制了通过利用自然环境中无标注数据提升模型潜力的可能性。因此,半监督学习的泛化性和可扩展性常受此假设制约。我们的方法旨在突破这些限制,在半监督学习中有效利用无约束无标注数据。UnMixMatch由三个核心组件构成:采用强数据增强的监督学习器提供强正则化、基于对比一致性的正则化器学习无标注数据的潜在表征、以及通过自监督损失增强从无标注数据习得的表征质量。我们在4个常用数据集上开展大量实验,结果表明该方法相比现有半监督方法性能提升达4.79%。全面的消融实验与敏感性分析验证了各提出组件的有效性与影响。