Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each image instance. Leveraging one of the state-of-the-art SSL method, we introduce a simplistic form of self-supervised learning method called Extreme-Multi-Patch Self-Supervised-Learning (EMP-SSL) that does not rely on many heuristic techniques for SSL such as weight sharing between the branches, feature-wise normalization, output quantization, and stop gradient, etc, and reduces the training epochs by two orders of magnitude. We show that the proposed method is able to converge to 85.1% on CIFAR-10, 58.5% on CIFAR-100, 38.1% on Tiny ImageNet and 58.5% on ImageNet-100 in just one epoch. Furthermore, the proposed method achieves 91.5% on CIFAR-10, 70.1% on CIFAR-100, 51.5% on Tiny ImageNet and 78.9% on ImageNet-100 with linear probing in less than ten training epochs. In addition, we show that EMP-SSL shows significantly better transferability to out-of-domain datasets compared to baseline SSL methods. We will release the code in https://github.com/tsb0601/EMP-SSL.
翻译:近期,自监督学习在图像表征学习领域取得了巨大成功。然而,尽管实证效果显著,多数自监督学习方法仍属于"低效"学习器,通常需要数百个训练周期才能完全收敛。本研究表明,提升自监督学习效率的关键在于增加单张图像实例的裁剪块数量。基于当前最先进的自监督学习方法,我们提出了一种名为"极致多块自监督学习"的简化方法。该方法无需依赖诸多自监督学习的启发式技巧(如分支间权值共享、特征级归一化、输出量化及梯度停止等),并将训练周期减少两个数量级。实验表明,所提方法在CIFAR-10上仅需一个周期即可收敛至85.1%,在CIFAR-100上达58.5%,在Tiny ImageNet上达38.1%,在ImageNet-100上达58.5%。此外,在少于十个训练周期内,该方法通过线性探测在CIFAR-10上达到91.5%,在CIFAR-100上达到70.1%,在Tiny ImageNet上达到51.5%,在ImageNet-100上达到78.9%。我们还证实,与基线自监督学习方法相比,EMP-SSL在跨领域数据集上展现出显著更优的可迁移性。相关代码将发布于https://github.com/tsb0601/EMP-SSL。