Self-supervised visual learning in the low-data regime: a comparative evaluation

Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploiting supervised transfer learning. Despite the relatively straightforward conceptualization and applicability of SSL, it is not always feasible to collect and/or to utilize very large pretraining datasets, especially when it comes to real-world application settings. In particular, in cases of specialized and domain-specific application scenarios, it may not be achievable or practical to assemble a relevant image pretraining dataset in the order of millions of instances or it could be computationally infeasible to pretrain at this scale. This motivates an investigation on the effectiveness of common SSL pretext tasks, when the pretraining dataset is of relatively limited/constrained size. In this context, this work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches, and, subsequently, conducts a thorough comparative experimental evaluation in the low-data regime, targeting to identify: a) what is learnt via low-data SSL pretraining, and b) how do different SSL categories behave in such training scenarios. Interestingly, for domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining on general datasets. Grounded on the obtained results, valuable insights are highlighted regarding the performance of each category of SSL methods, which in turn suggest straightforward future research directions in the field.

翻译：自监督学习（SSL）是当代深度神经网络（DNN）的一种有价值且稳健的训练方法，能够在无需真实标签/标注的“前置任务”上进行无监督预训练。这使得从海量未标记训练数据中高效学习表征成为可能，进而通过利用监督迁移学习提升“下游任务”的准确性。尽管SSL的概念化和应用相对直观，但在实际应用场景中，收集和/或利用非常大的预训练数据集并不总是可行。特别是在专业化和领域特定的应用场景中，可能无法实现或难以收集数量级达百万的相关图像预训练数据集，或者在此规模上进行预训练在计算上不可行。这促使我们研究当预训练数据集规模相对有限/受限时，常见SSL前置任务的有效性。在此背景下，本文提出了一种现代视觉SSL方法的分类体系，并详细解释和探讨了各类方法的主要特点，随后在低数据体制下进行了全面的比较实验评估，旨在明确：a）通过低数据SSL预训练学到了什么，以及b）不同SSL类别在此类训练场景中的表现。有趣的是，对于领域特定的下游任务，领域内低数据SSL预训练优于在通用数据集上进行大规模预训练的常见做法。基于所获结果，我们针对各类SSL方法的性能提出了有价值的见解，进而为该领域未来的研究方向提供了简明建议。