In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of $K$-views can be increased by $_{K}\mathrm{C}_{2}$ times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
翻译:在过去几年中,对比学习在视觉无监督表示学习的成功中发挥了核心作用。与此同时,高性能的非对比学习方法也得到了发展。虽然大多数工作仅利用两个视角,但我们仔细回顾了现有的多视角方法,并提出了一种通用的多视角策略,可以提升任何对比或非对比方法的学习速度和性能。我们首先分析了CMC的全图范式,并通过实验证明,对于小学习率和训练早期阶段,$K$个视角的学习速度可以提升$_{K}\mathrm{C}_{2}$倍。接着,我们通过混合仅由裁剪增强创建的视角、采用SwAV多裁剪中的小尺寸视角以及修改负采样方式,对CMC的全图范式进行了升级。由此产生的多视角策略称为ECPP(高效组合正样本对)。我们将ECPP应用于SimCLR,并在CIFAR-10和ImageNet-100上评估线性分类性能,以研究其有效性。在每个基准测试中,我们均达到了最先进的性能。在ImageNet-100上,ECPP增强的SimCLR甚至超越了监督学习。