Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.
翻译:自监督学习(SSL)流程在架构设计、数据增强或预训练数据等众多方面存在差异。然而,SSL通常仅通过单一指标——ImageNet上的线性探测——进行评估。这无法揭示模型为何或何时表现更优,亦无法指明改进方向。为解决此问题,我们提出一种SSL风险分解方法,该方法通过考虑表示学习环节产生的误差,将经典的监督近似-估计分解理论进行泛化。我们的分解包含四个误差分量:近似误差、表示可用性误差、探测泛化误差和编码器泛化误差。我们为每个分量提供高效估计量,并利用它们分析30种设计选择对ImageNet上169个SSL视觉模型的影响。该分析为SSL模型的设计与应用提供了宝贵见解。例如,它揭示了误差的主要来源,并展示了如何通过权衡各误差分量在特定场景(全样本与少样本)下改进SSL。所有结果与预训练模型均可在https://github.com/YannDubs/SSL-Risk-Decomposition获取。