Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. To be specific, it consists of adaptive superpixel and sieving mechanisms, fully dedicated to AL. At each round of AL, we adaptively merge neighboring pixels of similar learned features into superpixels. We then query a selected subset of these superpixels using an acquisition function assuming no uniform superpixel size. This approach is more efficient than existing methods, which rely only on innate features such as RGB color and assume uniform superpixel sizes. Obtaining a dominant label per superpixel drastically reduces annotators' burden as it requires fewer clicks. However, it inevitably introduces noisy annotations due to mismatches between superpixel and ground truth segmentation. To address this issue, we further devise a sieving mechanism that identifies and excludes potentially noisy annotations from learning. Our experiments on both Cityscapes and PASCAL VOC datasets demonstrate the efficacy of adaptive superpixel and sieving mechanisms.
翻译:学习语义分割需要像素级标注,这既耗时又昂贵。为了降低标注成本,我们提出了一种基于超像素的主动学习框架,该框架改为为每个超像素收集主导标签。具体而言,该框架包含自适应超像素和筛选机制,完全服务于主动学习。在每轮主动学习中,我们自适应地将具有相似学习特征的相邻像素合并为超像素。随后,我们使用一种不假设均匀超像素大小的采集函数,查询这些超像素中的选定子集。该方法比现有仅依赖RGB颜色等固有特征并假设均匀超像素大小的现有方法更为高效。为每个超像素获取主导标签可大幅减轻标注者负担,因为所需点击次数更少。然而,由于超像素与真实分割之间存在不匹配,这不可避免地引入了噪声标注。为解决该问题,我们进一步设计了一种筛选机制,用于识别并排除可能存在的噪声标注。我们在Cityscapes和PASCAL VOC数据集上的实验证明了自适应超像素与筛选机制的有效性。