Contrastive instance discrimination outperforms supervised learning in downstream tasks like image classification and object detection. However, this approach heavily relies on data augmentation during representation learning, which may result in inferior results if not properly implemented. Random cropping followed by resizing is a common form of data augmentation used in contrastive learning, but it can lead to degraded representation learning if the two random crops contain distinct semantic content. To address this issue, this paper introduces LeOCLR (Leveraging Original Images for Contrastive Learning of Visual Representations), a framework that employs a new instance discrimination approach and an adapted loss function that ensures the shared region between positive pairs is semantically correct. The experimental results show that our approach consistently improves representation learning across different datasets compared to baseline models. For example, our approach outperforms MoCo-v2 by 5.1% on ImageNet-1K in linear evaluation and several other methods on transfer learning tasks.
翻译:对比实例判别在下游任务(如图像分类与目标检测)中优于监督学习。然而,该方法在表征学习过程中高度依赖数据增强,若实施不当可能导致性能不佳。随机裁剪后调整大小是对比学习中常用的数据增强形式,但当两个随机裁剪图像包含不同语义内容时,可能会导致表征学习质量下降。为解决这一问题,本文提出LeOCLR(利用原始图像进行视觉表征对比学习)框架,该框架采用新的实例判别方法及适配的损失函数,确保正样本对之间的共享区域具有语义正确性。实验结果表明,与基线模型相比,本方法在不同数据集上均能持续改善表征学习效果。例如,在ImageNet-1K线性评估中,本方法相较MoCo-v2提升5.1%,并在迁移学习任务上优于其他多种方法。