Depth-aware panoptic segmentation is an emerging topic in computer vision which combines semantic and geometric understanding for more robust scene interpretation. Recent works pursue unified frameworks to tackle this challenge but mostly still treat it as two individual learning tasks, which limits their potential for exploring cross-domain information. We propose a deeply unified framework for depth-aware panoptic segmentation, which performs joint segmentation and depth estimation both in a per-segment manner with identical object queries. To narrow the gap between the two tasks, we further design a geometric query enhancement method, which is able to integrate scene geometry into object queries using latent representations. In addition, we propose a bi-directional guidance learning approach to facilitate cross-task feature learning by taking advantage of their mutual relations. Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets. Moreover, our guidance learning approach is shown to deliver performance improvement even under incomplete supervision labels.
翻译:深度感知全景分割是计算机视觉领域的新兴课题,它结合语义与几何理解以实现更鲁棒的场景解析。近期研究致力于构建统一框架应对这一挑战,但多数仍将其视为两个独立的学习任务,这限制了跨域信息探索的潜力。本文提出一种深度融合的全景分割架构,以相同对象查询实现逐片段级别的联合分割与深度估计。为缩小两项任务间的差距,我们进一步设计几何查询增强方法,通过潜在表征将场景几何信息整合至对象查询中。此外,我们提出双向引导学习方法,利用任务间的互惠关系促进跨任务特征学习。该方法在Cityscapes-DVPS和SemKITTI-DVPS数据集上均创下深度感知全景分割的新纪录。实验表明,即使在弱监督标签条件下,所提引导学习方法仍能带来性能提升。