Self-supervised learning has shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure modes and interpreting their learned representations. In this paper, we study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins. Without the use of class label information, we discover discriminative features that correspond to unique physical attributes in images, present mostly in correctly-classified representations. Using these features, we can compress the representation space by up to $40\%$ without significantly affecting linear classification performance. We then propose Self-Supervised Representation Quality Score (or Q-Score), a model-agnostic, unsupervised score that can reliably predict if a given sample is likely to be mis-classified during linear evaluation, achieving AUPRC of 91.45 on ImageNet-100 and 78.78 on ImageNet-1K. Q-Score can also be used as a regularization term on any pre-trained self-supervised model to remedy low-quality representations. Fine-tuning with Q-Score regularization can boost the linear classification performance of state-of-the-art self-supervised models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K compared to their baselines. Finally, using gradient heatmaps and Salient ImageNet masks, we define a metric to quantify the interpretability of each representation. We show that discriminative features are strongly correlated to core attributes and enhancing these features through Q-score regularization makes representations more interpretable across all self-supervised models.
翻译:自监督学习在下游分类任务中展现出显著成效,但对其失败模式及所学表示的解释性研究仍较为有限。本文系统研究了包括SimCLR、SwaV、MoCo、BYOL、DINO、SimSiam、VICReg和Barlow Twins在内的当前最先进自监督模型的表示空间。在不使用类别标签信息的前提下,我们发现了与图像中独特物理属性对应的判别性特征,这些特征主要存在于分类正确的表示中。利用这些特征,我们可将表示空间压缩高达40%而不显著影响线性分类性能。在此基础上,我们提出自监督表示质量分数(Q-Score),这是一种与模型无关的无监督评分指标,能够可靠预测线性评估阶段样本被错误分类的可能性,在ImageNet-100和ImageNet-1K数据集上分别达到91.45和78.78的AUPRC值。Q-Score还可作为正则化项应用于任意预训练自监督模型以改善低质量表示。与基准方法相比,采用Q-Score正则化进行微调可在ImageNet-100和ImageNet-1K上分别将最先进自监督模型的线性分类性能提升高达5.8%和3.7%。最后,通过梯度热力图和Salient ImageNet掩码,我们定义了量化各表示可解释性的指标。实验表明,判别性特征与核心属性高度相关,通过Q-Score正则化增强这些特征能够使所有自监督模型的表示更具可解释性。