Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.
翻译:自监督学习(SSL)流水线在架构设计、数据增强策略或预训练数据等诸多方面存在差异。然而,自监督学习通常仅通过单一指标——ImageNet上的线性探测——进行评估。这种做法无法深入揭示模型为何或何时表现更优,也无法为模型改进提供指引。为解决此问题,我们提出了一种自监督学习风险分解方法,该方法通过引入表征学习阶段产生的误差项,将经典的监督学习逼近-估计分解进行了泛化。我们的分解包含四个误差分量:逼近误差、表征可用性误差、探测泛化误差及编码器泛化误差。我们为每个分量设计了高效估计量,并据此分析了30种设计选择对169个在ImageNet上评估的自监督视觉模型的影响。这一分析为自监督学习模型的设计与应用提供了宝贵见解。例如,该方法揭示了误差的主要来源,并展示了如何通过权衡各误差分量在特定场景(全样本学习对比少样本学习)中改进自监督学习。所有结果及预训练模型均可在https://github.com/YannDubs/SSL-Risk-Decomposition获取。