Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of $k$-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or $k$-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them.
翻译:线性探测(LP)(及$k$-最近邻)在上游带标签数据集(如ImageNet)上的表现,以及迁移学习(TL)到各类下游数据集的表现,是评估自监督学习(SSL)所学视觉表示质量的常用方法。尽管现有SSL方法在这些评价协议下展现出良好性能,但我们观察到其性能对LP和TL中涉及的超参数非常敏感。我们认为这是一种不良特性,因为真正通用的表示应能轻松适应其他任何视觉识别任务,即所学表示应对LP和TL超参数的设置具有鲁棒性。在本研究中,我们尝试通过使用最先进的SSL方法开展大量实验,来探究性能敏感性的成因。首先,我们发现LP的输入归一化对于消除因超参数引起的性能变化至关重要。具体而言,在将输入馈送至线性分类器之前进行批归一化,可显著提升评价的稳定性,并同时解决$k$-NN与LP指标不一致的问题。其次,针对TL,我们证明SSL中的权重衰减参数会显著影响所学表示的迁移性,而这一点无法通过LP或$k$-NN在上游数据集上的评价识别出来。我们相信,本研究的发现将有助于引起学界对当前SSL评价方案缺陷的关注,并强调重新审视这些方案的必要性。