Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity. More precisely, semantic areas, object instances, and semantic parts are predicted simultaneously. In this paper, we present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation. Two aspects are of utmost importance for this: First, a unified model for the three problems is desired that allows for mutually improved and consistent representation learning. Second, balancing the combination so that it gives equal importance to all individual results during fusion. Our proposed JPPF is parameter-free and dynamically balances its input. The method is evaluated and compared on the Cityscapes Panoptic Parts (CPP) and Pascal Panoptic Parts (PPP) datasets in terms of PartPQ and Part-Whole Quality (PWQ). In extensive experiments, we verify the importance of our fair fusion, highlight its most significant impact for areas that can be further segmented into parts, and demonstrate the generalization capabilities of our design without fine-tuning on 5 additional datasets.
翻译:部件感知全景分割是一个计算机视觉问题,旨在从多个粒度层次提供场景语义理解。具体而言,语义区域、目标实例和语义部件被同时预测。本文提出了联合全景部件融合(JPPF)方法,该方法有效结合三种独立分割结果,从而获得全景部件分割。对此,两个核心方面至关重要:首先,需要一个统一模型处理这三个问题,以实现相互改进且一致的表示学习;其次,需平衡融合过程,使所有独立结果在融合时具有同等重要性。我们提出的JPPF方法无需参数且可动态平衡其输入。该方法在Cityscapes全景部件(CPP)和Pascal全景部件(PPP)数据集上,以PartPQ和部件整体质量(PWQ)为指标进行评估与比较。通过大量实验,我们验证了公平融合的重要性,强调了其对可进一步分解为部件的区域具有最显著的影响,并展示了设计在无需微调的情况下对5个额外数据集的泛化能力。