Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors only generate a single mask and fail to deal with target ambiguity. In this paper, we propose a novel interactive segmentation network named PiClick, to yield all potentially reasonable masks and suggest the most plausible one for the user. Specifically, PiClick utilizes a Transformer-based architecture to generate all potential target masks by mutually interactive mask queries. Moreover, a Target Reasoning module is designed in PiClick to automatically suggest the user-desired mask from all candidates, relieving target ambiguity and extra-human efforts. Extensive experiments on 9 interactive segmentation datasets demonstrate PiClick performs favorably against previous state-of-the-arts considering the segmentation results. Moreover, we show that PiClick effectively reduces human efforts in annotating and picking the desired masks. To ease the usage and inspire future research, we release the source code of PiClick together with a plug-and-play annotation tool at https://github.com/cilinyan/PiClick.
翻译:基于点击的交互式分割旨在通过人类点击生成目标掩膜,从而促进高效的像素级标注和图像编辑。在此类任务中,目标歧义仍是阻碍分割准确性与效率的问题。也就是说,在富含上下文的场景中,一次点击可能对应多个潜在目标,而先前大多数交互式分割器仅生成单个掩膜,无法处理目标歧义。本文提出一种名为PiClick的新型交互式分割网络,以生成所有潜在合理掩膜并为用户推荐最可能的候选结果。具体而言,PiClick采用基于Transformer的架构,通过相互交互的掩膜查询生成所有潜在目标掩膜。此外,PiClick设计了一个目标推理模块,用于从所有候选掩膜中自动推荐用户期望的掩膜,从而缓解目标歧义并减少额外人工操作。在9个交互式分割数据集上的大量实验表明,PiClick在分割结果方面优于先前最先进方法。同时,我们证明PiClick能有效减少用户在标注和选择期望掩膜方面的人工投入。为便于使用并启发未来研究,我们在https://github.com/cilinyan/PiClick上发布了PiClick的源代码及即插即用标注工具。