Feature Decoupling-Recycling Network for Fast Interactive Segmentation

Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies and then recycles components for each user interaction. Thus, the efficiency of the whole interactive process can be significantly improved. To be specific, we apply the Decoupling-Recycling strategy from three perspectives to address three types of discrepancies, respectively. First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately. Second, FDRN decouples high-level and low-level features from stratified semantic representations to enhance feature learning. Third, during the encoding of user guidance, current user guidance is decoupled from historical guidance to highlight the effect of current user guidance. We conduct extensive experiments on 6 datasets from different domains and modalities, which demonstrate the following merits of our model: 1) superior efficiency than other methods, particularly advantageous in challenging scenarios requiring long-term interactions (up to 4.25x faster), while achieving favorable segmentation performance; 2) strong applicability to various methods serving as a universal enhancement technique; 3) well cross-task generalizability, e.g., to medical image segmentation, and robustness against misleading user guidance.

翻译：近期交互式分割方法迭代地输入源图像、用户引导和先前预测的掩膜，却未考虑源图像的固有不变性。因此，每次交互都需要重复从源图像中提取特征，导致大量计算冗余。本文提出特征解耦-复用网络（Feature Decoupling-Recycling Network, FDRN），该方法基于各建模组件的内在差异对其进行解耦，并在每次用户交互时复用这些组件，从而显著提升整个交互过程的效率。具体而言，我们从三个层面应用解耦-复用策略以分别处理三类差异：首先，模型将源图像语义学习与用户引导编码解耦，以分别处理两类输入域；其次，FDRN从分层语义表征中解耦出高层和低层特征以增强特征学习；最后，在用户引导编码过程中，将当前用户引导与历史引导解耦以突出当前引导的作用。我们在六个不同领域和模态的数据集上开展广泛实验，验证了模型的以下优势：1）在保持良好分割性能的同时，效率优于其他方法，尤其在需要长期交互的挑战性场景中具备显著优势（提速达4.25倍）；2）作为通用增强技术，可强适用性地适配多种方法；3）具备良好的跨任务泛化能力（如应用于医学图像分割）及对误导性用户引导的鲁棒性。