Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.
翻译:感知被遮挡物体的完整形状对于人类和机器智能至关重要。尽管非模态分割任务旨在预测部分遮挡物体的完整掩膜,但逐像素标注真实非模态掩膜既耗时又费力。框级有监督非模态分割通过仅依赖真实边界框和实例类别作为监督信号来解决这一挑战,从而避免了详尽的像素级标注需求。然而,当前框级方法在生成低分辨率掩膜和模糊边界方面存在局限,难以满足实际应用场景的需求。我们提出了一种新颖的解决方案,通过从可见掩膜到对应非模态掩膜的有向扩展方法来解决该问题。我们的方法基于重叠区域(即不同实例相交的区域)构建了混合端到端网络。针对重叠区域和非重叠区域的不同特性,分别采用差异化的分割策略。为了引导可见掩膜的扩展,我们引入了精心设计的重叠区域连通性损失,该损失利用可见掩膜的相关性促进精确的非模态分割。在多个具有挑战性的数据集上进行的实验表明,所提出方法能够以较大优势超越现有最先进方法。