Training a computer vision system to segment a novel class typically requires collecting and painstakingly annotating lots of images with objects from that class. Few-shot segmentation techniques reduce the required number of images to learn to segment a new class, but careful annotations of object boundaries are still required. On the other hand, interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time (typically, using clicks given by an expert) in a class-agnostic manner. We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes. Instead of trivially feeding interactive segmentation masks as ground truth to a few-shot segmentation model, we propose IFSENet, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support (training, at least clicked upon once) as well as query (test, never clicked upon) images. To trade-off effort for accuracy flexibly, the number of images and clicks can be incrementally added to the support set to further improve the segmentation of support as well as query images. The proposed model approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort (clicks instead of maps), when tested on Pascal and SBD datasets on query images. It also works well as an interactive segmentation method on support images.
翻译:训练一个计算机视觉系统分割新类别通常需要收集并精心标注大量包含该类物体的图像。少样本分割技术能减少学习分割新类别所需的图像数量,但仍需对物体边界进行精确标注。另一方面,交互式分割技术仅专注于以类别无关的方式逐步改进单个物体的分割(通常通过专家给出的点击操作)。我们结合这两个概念,大幅降低训练新类别分割模型所需的工作量。不同于简单地将交互式分割掩码作为真值输入少样本分割模型,我们提出IFSENet,该网络可在单张或少量支持图像上接受以点击形式提供的稀疏监督,从而生成支持图像(训练集,至少经过一次点击)和查询图像(测试集,从未被点击)的掩码。为灵活权衡工作量与精度,可逐步向支持集添加图像和点击次数,以进一步改善支持图像和查询图像的分割效果。在Pascal和SBD数据集上对查询图像的测试表明,所提模型以显著更低的标注工作量(点击而非完整掩码)达到了此前最先进少样本分割模型的精度,同时该方法在支持图像上作为交互式分割方法也表现良好。