Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage large unlabeled data. The development of JE-SSL methods was driven primarily by the search for ever increasing downstream classification accuracies, using huge computational resources, and typically built upon insights and intuitions inherited from a close parent JE-SSL method. This has led unwittingly to numerous pre-conceived ideas that carried over across methods e.g. that SimCLR requires very large mini batches to yield competitive accuracies; that strong and computationally slow data augmentations are required. In this work, we debunk several such ill-formed a priori ideas in the hope to unleash the full potential of JE-SSL free of unnecessary limitations. In fact, when carefully evaluating performances across different downstream tasks and properly optimizing hyper-parameters of the methods, we most often -- if not always -- see that these widespread misconceptions do not hold. For example we show that it is possible to train SimCLR to learn useful representations, while using a single image patch as negative example, and simple Gaussian noise as the only data augmentation for the positive pair. Along these lines, in the hope to democratize JE-SSL and to allow researchers to easily make more extensive evaluations of their methods, we introduce an optimized PyTorch library for SSL.
翻译:联合嵌入自监督学习(JE-SSL)近年来发展迅速,因其有望有效利用大规模无标注数据。JE-SSL方法的开发主要受追求不断提升的下游分类准确率驱动,依赖庞大的计算资源,且通常基于从相近的父级JE-SSL方法继承而来的见解和直觉。这无意中导致了许多在不同方法间沿袭的预设立念,例如SimCLR需要极大尺寸的小批量数据才能获得有竞争力的准确率;以及需要强大且计算缓慢的数据增强。本研究旨在破除若干此类错误先验观念,以释放JE-SSL的完整潜力,摆脱不必要的限制。事实上,当仔细评估不同下游任务的性能并适当优化方法的超参数时,我们大多数情况下(若非总是)发现这些普遍误解并不成立。例如,我们证明可以使用单张图像块作为负样本,仅以简单高斯噪声作为正样本对的唯一数据增强来训练SimCLR学习有效表征。基于此,为推动JE-SSL民主化并让研究者能更便捷地对其方法进行广泛评估,我们引入了一个经过优化的PyTorch自监督学习库。