The recent advances in representation learning inspire us to take on the challenging problem of unsupervised image classification tasks in a principled way. We propose ContraCluster, an unsupervised image classification method that combines clustering with the power of contrastive self-supervised learning. ContraCluster consists of three stages: (1) contrastive self-supervised pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3) prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly accurate, categorically prototypical images in an embedding space learned by contrastive learning. We use sampled prototypes as noisy labeled data to perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and large unlabeled data to further enhance the accuracy. We demonstrate empirically that ContraCluster achieves new state-of-the-art results for standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin. Without any labels, ContraCluster can achieve a 90.8% accuracy that is comparable to 95.8% by the best supervised counterpart.
翻译:近期表示学习的进步启发我们以严谨方式应对无监督图像分类这一富有挑战性的问题。我们提出ContraCluster方法,这是一种结合聚类与对比自监督学习能力的无监督图像分类方法。ContraCluster包含三个阶段:(1) 对比自监督预训练(CPT),(2) 对比原型采样(CPS),及(3) 基于原型的半监督微调(PB-SFT)。CPS能在对比学习获得的嵌入空间中选取具有高准确性的类别原型图像。我们将采样得到的原型作为带噪标注数据,通过半监督微调(PB-SFT)利用少量原型与大量无标签数据进一步提升准确率。实验表明,ContraCluster在CIFAR-10、STL-10和ImageNet-10等标准基准数据集上均达到新的最佳结果。例如,ContraCluster对CIFAR-10达到约90.8%的准确率,大幅优于DAC(52.2%)、IIC(61.7%)和SCAN(87.6%)。无需任何标签,ContraCluster即可达到90.8%的准确率,与该任务最优监督方法的95.8%准确率相当。