We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image and then learns a detector on these masks using our robust loss function. We further improve the performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP50 by over 2.7 times on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% APbox and 6.6% APmask on COCO when training with 5% labels.
翻译:我们提出Cut-and-LEaRn(CutLER),一种用于训练无监督目标检测与分割模型的简洁方法。我们利用自监督模型在无需监督条件下"发现"物体的特性,并将其放大,从而在没有人工标注的情况下训练出最先进的定位模型。CutLER首先采用所提出的MaskCut方法为图像中的多个物体生成粗糙掩膜,随后基于我们设计的鲁棒损失函数在这些掩膜上训练检测器。通过让模型对其预测结果进行自训练,我们进一步提升了性能。与现有工作相比,CutLER更为简洁,兼容不同的检测架构,并能检测多个物体。CutLER还是一种零样本无监督检测器,在涵盖视频帧、绘画、素描等领域的11个基准上,将检测性能AP50提升了2.7倍以上。通过微调,CutLER可作为少样本检测器使用,在COCO数据集上仅用5%标签训练时,其APbox和APmask分别超越MoCo-v2达7.3%和6.6%。