Object-Aware Cropping for Self-Supervised Learning

A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mostly satisfied in datasets such as ImageNet where there is a large, centered object, which is highly likely to be present in random crops of the full image. However, in other datasets such as OpenImages or COCO, which are more representative of real world uncurated data, there are typically multiple small objects in an image. In this work, we show that self-supervised learning based on the usual random cropping performs poorly on such datasets. We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the model to learn both object and scene level semantic representations. Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks. For example, on OpenImages, our approach achieves an improvement of 8.8% mAP over random scene-level cropping using MoCo-v2 based pre-training. We also show significant improvements on COCO and PASCAL-VOC object detection and segmentation tasks over the state-of-the-art self-supervised learning approaches. Our approach is efficient, simple and general, and can be used in most existing contrastive and non-contrastive self-supervised learning frameworks.

翻译：自监督学习近期成功的一个核心组成部分是裁剪数据增强，即选择图像的子区域作为自监督损失中的正视图。其潜在假设是：给定图像的随机裁剪和调整大小的区域共享关于目标对象的信息，而学习到的表示将捕获这些信息。这一假设在ImageNet等数据集中基本成立，该类数据集包含大型、居中的对象，且此类对象极有可能出现在整张图像的随机裁剪中。然而，在OpenImages或COCO等其他更具真实世界未整理数据代表性的数据集中，图像中通常包含多个小对象。本研究表明，基于常规随机裁剪的自监督学习在此类数据集上表现不佳。我们提出用目标提议算法获得的裁剪替换一个或两个随机裁剪。这促使模型同时学习对象级和场景级的语义表示。采用这种称为面向对象裁剪的方法，在分类和对象检测基准测试中相较于场景裁剪取得了显著改进。例如，在OpenImages上，我们的方法基于MoCo-v2预训练，比随机场景级别裁剪实现了8.8% mAP的提升。我们还在COCO和PASCAL-VOC对象检测与分割任务上，相较于最先进的自监督学习方法取得了显著改进。该方法高效、简单且通用，可应用于大多数现有对比式和非对比式自监督学习框架。

相关内容

监督学习

关注 0

监督学习是指：利用一组已知类别的样本调整分类器的参数，使其达到所要求性能的过程，也称为监督训练或有教师学习。监督学习是从标记的训练数据来推断一个功能的机器学习任务。训练数据包括一套训练示例。在监督学习中，每个实例都是由一个输入对象（通常为矢量）和一个期望的输出值（也称为监督信号）组成。监督学习算法是分析该训练数据，并产生一个推断的功能，其可以用于映射出新的实例。一个最佳的方案将允许该算法来正确地决定那些看不见的实例的类标签。这就要求学习算法是在一种“合理”的方式从一种从训练数据到看不见的情况下形成。

【WWW2022】图上的聚类感知的监督对比学习，ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs

专知会员服务

18+阅读 · 2022年3月28日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

专知会员服务

32+阅读 · 2020年5月14日