Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.
翻译:感知被遮挡物体的完整形态对人类和机器智能至关重要。非模态分割任务旨在预测部分遮挡物体的完整掩模,但逐像素标注真实非模态掩模耗时耗力。框级监督非模态分割通过仅依赖真实边界框和实例类别作为监督信号,缓解了对逐像素标注的依赖。然而,现有框级方法在生成低分辨率掩模和模糊边界方面存在局限,难以满足实际应用需求。我们提出一种新颖解决方案,通过从可见掩模到对应非模态掩模的定向扩展方法应对此问题。该方法基于重叠区域(即不同实例相交的区域)构建混合端到端网络,根据区域特性对重叠区域与非重叠区域采用差异化的分割策略。为引导可见掩模扩展,我们针对重叠区域设计了精巧的连通性损失函数,该函数利用与可见掩模的关联性促进精确的非模态分割。在多个具有挑战性的数据集上的实验表明,所提方法能以显著优势超越现有最优方法。