Asymmetric appearance between positive pair effectively reduces the risk of representation degradation in contrastive learning. However, there are still a mass of appearance similarities between positive pair constructed by the existing methods, which inhibits the further representation improvement. In this paper, we propose a novel asymmetric patch sampling strategy for contrastive learning, to further boost the appearance asymmetry for better representations. Specifically, dual patch sampling strategies are applied to the given image, to obtain asymmetric positive pairs. First, sparse patch sampling is conducted to obtain the first view, which reduces spatial redundancy of image and allows a more asymmetric view. Second, a selective patch sampling is proposed to construct another view with large appearance discrepancy relative to the first one. Due to the inappreciable appearance similarity between positive pair, the trained model is encouraged to capture the similarity on semantics, instead of low-level ones. Experimental results demonstrate that our proposed method significantly outperforms the existing self-supervised methods on both ImageNet-1K and CIFAR dataset, e.g., 2.5% finetune accuracy improvement on CIFAR100. Furthermore, our method achieves state-of-the-art performance on downstream tasks, object detection and instance segmentation on COCO.Additionally, compared to other self-supervised methods, our method is more efficient on both memory and computation during training. The source code is available at https://github.com/visresearch/aps.
翻译:正样本对之间的非对称外观有效降低了对比学习中表示退化的风险。然而,现有方法构造的正样本对之间仍存在大量外观相似性,阻碍了表示的进一步提升。本文提出一种新颖的非对称块采样策略用于对比学习,以进一步增强外观非对称性,获得更优的表示。具体而言,我们对给定图像应用双分支块采样策略,以获得非对称的正样本对。首先,采用稀疏块采样获取第一个视图,这减少了图像的空间冗余,并允许获得更非对称的视图。其次,提出选择性块采样来构建与第一个视图具有较大外观差异的另一个视图。由于正样本对之间外观相似性极低,模型被激励去捕捉语义层面的相似性,而非低级特征相似性。实验结果表明,我们的方法在ImageNet-1K和CIFAR数据集上显著优于现有自监督方法,例如在CIFAR100上微调准确率提升2.5%。此外,我们的方法在下游任务(COCO数据集上的目标检测和实例分割)中达到最先进性能。与其他自监督方法相比,我们的方法在训练期间的内存和计算效率更高。源代码已开源至https://github.com/visresearch/aps。