Interactive segmentation aims to extract objects of interest from an image based on user-provided clicks. In real-world applications, there is often a need to segment a series of images featuring the same target object. However, existing methods typically process one image at a time, failing to consider the sequential nature of the images. To overcome this limitation, we propose a novel method called Sequence Prompt Transformer (SPT), the first to utilize sequential image information for interactive segmentation. Our model comprises two key components: (1) Sequence Prompt Transformer (SPT) for acquiring information from sequence of images, clicks and masks to improve accurate. (2) Top-k Prompt Selection (TPS) selects precise prompts for SPT to further enhance the segmentation effect. Additionally, we create the ADE20K-Seq benchmark to better evaluate model performance. We evaluate our approach on multiple benchmark datasets and show that our model surpasses state-of-the-art methods across all datasets.
翻译:交互式分割旨在根据用户提供的点击从图像中提取感兴趣的对象。在实际应用中,经常需要对包含同一目标对象的一系列图像进行分割。然而,现有方法通常一次处理一张图像,未能考虑图像的序列特性。为克服这一局限,我们提出了一种名为序列提示Transformer(SPT)的新方法,这是首个利用序列图像信息进行交互式分割的模型。我们的模型包含两个关键组件:(1) 序列提示Transformer(SPT),用于从图像序列、点击和掩码中获取信息以提高准确性。(2) Top-k提示选择(TPS)为SPT选择精确提示以进一步增强分割效果。此外,我们创建了ADE20K-Seq基准数据集以更好地评估模型性能。我们在多个基准数据集上评估了所提方法,结果表明我们的模型在所有数据集上均超越了现有最先进方法。