Self-supervised learning (SSL) has shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure modes and interpreting their learned representations. In this paper, we study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins. Without the use of class label information, we discover discriminative features that correspond to unique physical attributes in images, present mostly in correctly-classified representations. Using these features, we can compress the representation space by up to 40% without significantly affecting linear classification performance. We then propose Self-Supervised Representation Quality Score (or Q-Score), an unsupervised score that can reliably predict if a given sample is likely to be mis-classified during linear evaluation, achieving AUPRC of 91.45 on ImageNet-100 and 78.78 on ImageNet-1K. Q-Score can also be used as a regularization term on pre-trained encoders to remedy low-quality representations. Fine-tuning with Q-Score regularization can boost the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K compared to their baselines. Finally, using gradient heatmaps and Salient ImageNet masks, we define a metric to quantify the interpretability of each representation. We show that discriminative features are strongly correlated to core attributes and, enhancing these features through Q-score regularization makes SSL representations more interpretable.
翻译:自监督学习在下游分类任务中展现出显著成果,但对其失败模式及学习到表示的可解释性研究仍显不足。本文系统研究了SimCLR、SwaV、MoCo、BYOL、DINO、SimSiam、VICReg和Barlow Twins等先进自监督模型的表示空间。在不使用类别标签信息的情况下,我们发现了与图像中独特物理属性相对应的判别特征——这些特征主要存在于正确分类的表示中。利用这些特征,可将表示空间压缩高达40%而几乎不影响线性分类性能。进而提出自监督表示质量分数(Q-Score),这是一个无需监督的评估指标,能可靠预测线性评估中样本被错误分类的概率,在ImageNet-100和ImageNet-1K数据集上分别达到91.45和78.78的AUPRC值。Q-Score还可作为正则项应用于预训练编码器以修复低质量表示。与基线相比,采用Q-Score正则化微调可使自监督模型的线性探测准确率在ImageNet-100上提升5.8%,在ImageNet-1K上提升3.7%。最后,通过梯度热力图和显著ImageNet掩码,我们定义了量化表示可解释性的指标。实验表明,判别特征与核心属性强相关,通过Q-Score正则化强化这些特征能显著提升自监督表示的可解释性。