Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-supervised methods inspired by self-supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods such as SwAV or DINO into semi-supervised learners. More precisely, we introduce a multi-task framework merging a supervised objective using ground-truth labels and a self-supervised objective relying on clustering assignments with a single cross-entropy loss. This approach may be interpreted as imposing the cluster centroids to be class prototypes. Despite its simplicity, we provide empirical evidence that our approach is highly effective and achieves state-of-the-art performance on CIFAR100 and ImageNet.
翻译:自监督学习模型已被证明无需人工标注即可学习丰富的视觉表示。然而,在众多实际场景中,标签仅部分可用,这推动了近期一系列受自监督原理启发的半监督方法研究。本文提出一种概念简洁但实证强大的方法,将基于聚类的自监督方法(如SwAV或DINO)转化为半监督学习器。具体而言,我们引入一个多任务框架,该框架通过单一交叉熵损失合并了使用真实标签的监督目标与依赖聚类分配的自监督目标。该方法可被解释为强制聚类中心成为类别原型。尽管方法简洁,我们提供的实证证据表明,该方法具有高效性,并在CIFAR100和ImageNet数据集上实现了最先进的性能。