We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN classifier trained to predict labels obtained by clustering the representations with K-means. We thus combine CL and ID into a single predictor -- CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several visual classification tasks, yielding improvements with respect to the competing baselines.
翻译:我们研究了在没有监督标签的情况下评估自监督学习(SSL)模型质量的问题,同时保持对训练过程中使用的架构、学习算法或数据操作不可知。我们认为,可以通过表达性和可学习性的视角来评估表示。我们提出使用内在维度(ID)来评估表达性,并引入簇可学习性(CL)来评估可学习性。CL通过KNN分类器的性能来衡量,该分类器训练用于预测通过K-means对表示进行聚类得到的标签。因此,我们将CL和ID结合为一个单一预测器——CLID。通过针对多样化SSL算法家族的大规模实证研究,我们发现CLID与同分布模型性能的相关性优于其他近期提出的评估方案。我们还将CLID应用于跨域泛化基准测试,在该场景中,CLID作为SSL模型在多个视觉分类任务上迁移性能的预测器,相较于竞争基线方法取得了改进。