Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.
翻译:感知被遮挡物体的完整形状对于人类和机器智能至关重要。非模态分割任务旨在预测部分遮挡物体的完整掩膜,但标注像素级真值非模态掩膜耗时且费力。基于框监督的非模态分割通过仅依赖真值边界框和实例类别作为监督来解决这一挑战,从而缓解了对详尽像素级标注的需求。然而,当前基于框的方法在生成低分辨率掩膜和不精确边界方面存在局限,无法满足实际应用的需求。我们提出了一种新颖的解决方案,通过引入从可见掩膜到相应非模态掩膜的定向扩展方法来应对此问题。该方法基于重叠区域(不同实例相交的区域)构建了一个混合端到端网络。根据区域的不同特征,对重叠区域和非重叠区域采用差异化的分割策略。为了引导可见掩膜的扩展,我们针对重叠区域设计了一种精心构建的连接性损失函数,该损失函数利用可见掩膜的相关性,有助于实现精确的非模态分割。在多个具有挑战性的数据集上进行的实验表明,我们提出的方法能以较大优势超越现有最优方法。