InterFormer: Real-time Interactive Image Segmentation

Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators' later click is based on models' feedback of annotators' former click. This serial interaction is unable to utilize model's parallelism capabilities. Second, in each interaction step, the model handles the invariant image along with the sparse variable clicks, resulting in a process that's highly repetitive and redundant. For efficient computations, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices. The code is available at https://github.com/YouHuang67/InterFormer.

翻译：交互式图像分割可使标注者高效地完成分割任务的像素级标注。然而，现有交互式分割流程因以下两个问题导致交互模型计算效率低下：其一，标注者的后续点击需依赖模型对前次点击的反馈，这种串行交互模式无法充分利用模型的并行计算能力；其二，在每次交互步骤中，模型需同时处理不变图像与稀疏的变量点击，导致流程高度重复冗余。为实现高效计算，我们提出名为InterFormer的方法，通过全新流程解决上述问题。InterFormer从现有流程中提取并预处理计算耗时的部分（即图像处理）。具体而言，InterFormer在高性能设备上采用大型视觉Transformer（ViT）并行预处理图像，随后利用轻量级模块——交互式多头自注意力（I-MSA）完成交互式分割。此外，I-MSA模块可部署于低功耗设备，拓展了交互式分割的实际应用场景。该模块利用预处理特征实时高效响应标注者输入。在多个数据集上的实验表明，InterFormer在计算效率与分割质量上均优于以往交互式分割模型，能在仅含CPU的设备上实现实时高质量交互式分割。代码开源地址：https://github.com/YouHuang67/InterFormer。