Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies and then recycles components for each user interaction. Thus, the efficiency of the whole interactive process can be significantly improved. To be specific, we apply the Decoupling-Recycling strategy from three perspectives to address three types of discrepancies, respectively. First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately. Second, FDRN decouples high-level and low-level features from stratified semantic representations to enhance feature learning. Third, during the encoding of user guidance, current user guidance is decoupled from historical guidance to highlight the effect of current user guidance. We conduct extensive experiments on 6 datasets from different domains and modalities, which demonstrate the following merits of our model: 1) superior efficiency than other methods, particularly advantageous in challenging scenarios requiring long-term interactions (up to 4.25x faster), while achieving favorable segmentation performance; 2) strong applicability to various methods serving as a universal enhancement technique; 3) well cross-task generalizability, e.g., to medical image segmentation, and robustness against misleading user guidance.
翻译:近期交互式分割方法迭代地将源图像、用户引导及先前预测的掩码作为输入,但未考虑源图像的不变性。因此,每次交互中重复提取源图像特征导致大量计算冗余。本文提出特征解耦-重用网络(FDRN),该方法基于建模组件的内在差异进行解耦,并在每次用户交互中重用相关组件,从而显著提升整个交互过程的效率。具体而言,我们从三个角度应用解耦-重用策略以分别处理三类差异:第一,模型将源图像语义学习与用户引导编码解耦,以分别处理两类输入域;第二,FDRN将分层语义表示中的高层与低层特征解耦以增强特征学习;第三,在用户引导编码过程中,将当前用户引导与历史引导解耦以凸显当前引导效果。我们在6个跨领域与模态的数据集上开展广泛实验,证明模型具备以下优势:1)相比其他方法具有更优的效率(尤其在需要长交互的复杂场景中速度提升达4.25倍),同时保持良好分割性能;2)作为通用增强技术可广泛适用于多种方法;3)具备跨任务泛化能力(如适用于医学图像分割)以及对误导性用户引导的鲁棒性。