Several unsupervised image segmentation approaches have been proposed which eliminate the need for dense manually-annotated segmentation masks; current models separately handle either semantic segmentation (e.g., STEGO) or class-agnostic instance segmentation (e.g., CutLER), but not both (i.e., panoptic segmentation). We propose an Unsupervised Universal Segmentation model (U2Seg) adept at performing various image segmentation tasks -- instance, semantic and panoptic -- using a novel unified framework. U2Seg generates pseudo semantic labels for these segmentation tasks via leveraging self-supervised models followed by clustering; each cluster represents different semantic and/or instance membership of pixels. We then self-train the model on these pseudo semantic labels, yielding substantial performance gains over specialized methods tailored to each task: a +2.6 AP$^{\text{box}}$ boost vs. CutLER in unsupervised instance segmentation on COCO and a +7.0 PixelAcc increase (vs. STEGO) in unsupervised semantic segmentation on COCOStuff. Moreover, our method sets up a new baseline for unsupervised panoptic segmentation, which has not been previously explored. U2Seg is also a strong pretrained model for few-shot segmentation, surpassing CutLER by +5.0 AP$^{\text{mask}}$ when trained on a low-data regime, e.g., only 1% COCO labels. We hope our simple yet effective method can inspire more research on unsupervised universal image segmentation.
翻译:已有多种无监督图像分割方法被提出,它们无需密集的人工标注分割掩码;但现有模型分别独立处理语义分割(例如STEGO)或类别无关的实例分割(例如CutLER),而无法同时处理两者(即全景分割)。我们提出了一种无监督通用分割模型(U2Seg),该模型通过新颖的统一框架,能够娴熟地执行多种图像分割任务——包括实例分割、语义分割和全景分割。U2Seg通过利用自监督模型结合聚类,为这些分割任务生成伪语义标签;每个聚类代表像素的不同语义和/或实例归属。随后,我们在这些伪语义标签上对模型进行自训练,相较于针对各任务定制的专用方法,性能显著提升:在COCO数据集的无监督实例分割任务中,AP$^{\text{box}}$比CutLER提高+2.6;在COCOStuff数据集的非监督语义分割任务中,像素精度比STEGO增加+7.0。此外,我们的方法为之前尚未探索的无监督全景分割设立了新的基准。U2Seg在少样本分割任务中也是强大的预训练模型:在低数据场景下(例如仅使用1%的COCO标签),其AP$^{\text{mask}}$比CutLER高出+5.0。我们希望这种简单而有效的方法能激发更多关于无监督通用图像分割的研究。