Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. For this task, most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. To solve this model, we utilize amortized variational inference to approximate the intractable GP posterior in a data-driven manner and then decouple the approximated GP posterior into double space forms for efficient sampling with linear complexity. Then, we correspondingly construct a GP classification framework, named GPCIS, which is integrated with the deep kernel learning mechanism for more flexibility. The main specificities of the proposed GPCIS lie in: 1) Under the explicit guidance of the derived GP posterior, the information contained in clicks can be finely propagated to the entire image and then boost the segmentation; 2) The accuracy of predictions at clicks has good theoretical support. These merits of GPCIS as well as its good generality and high efficiency are substantiated by comprehensive experiments on several benchmarks, as compared with representative methods both quantitatively and qualitatively.
翻译:基于点击的交互式分割旨在通过用户交互提取目标对象。目前,大多数基于深度学习的方法主要遵循通用语义分割流程。尽管取得了令人满意的性能,但这些方法未能充分且显式地利用和传递点击信息,不可避免地导致分割结果不理想,甚至无法准确分割点击位置。针对这一问题,本文提出将交互式分割任务建模为基于高斯过程的逐像素二分类模型。为解决该模型,我们采用摊销变分推断以数据驱动方式近似处理难解的高斯过程后验,并将其解耦为双空间形式以实现线性复杂度的高效采样。据此,我们构建了名为GPCIS的高斯过程分类框架,该框架集成深度核学习机制以增强灵活性。所提GPCIS的核心特性在于:1)在推导的高斯过程后验显式引导下,点击信息可有效传播至整幅图像,从而提升分割性能;2)点击位置的预测准确性具备良好的理论支撑。通过多个基准数据集上的定量与定性实验,与代表性方法的全面对比验证了GPCIS的上述优势、良好泛化性及高效性。