Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: https://github.com/UARK-AICV/ShapeFormer
翻译:全模态实例分割(AIS)是一项具有挑战性的任务,因为它需要同时预测图像中物体的可见部分和遮挡部分。现有的AIS方法依赖双向途径,既包括从全模态特征到可见特征的转换(全模态到可见),也包括从可见特征到全模态特征的转换(可见到全模态)。我们的观察表明,通过全模态到可见方式利用全模态特征可能会因遮挡/隐藏部分在可见显示中未呈现的额外信息而干扰可见特征。因此,在后续的可见到全模态转换过程中,可见特征质量会受损。为解决此问题,我们提出了ShapeFormer,这是一种解耦的基于Transformer的模型,采用可见到全模态转换。它促进了输出分割之间的显式关系,并避免了全模态到可见转换的需求。ShapeFormer包含三个关键模块:(i)可见遮挡掩膜头,用于预测具有遮挡感知的可见分割;(ii)形状先验全模态掩膜头,用于预测全模态和遮挡掩膜;(iii)类别特定形状先验检索器,旨在提供形状先验知识。在多个AIS基准上的全面实验和广泛消融研究证明了我们ShapeFormer的有效性。代码可在https://github.com/UARK-AICV/ShapeFormer获取。