Monocular geometric scene understanding combines panoptic segmentation and self-supervised depth estimation, focusing on real-time application in autonomous vehicles. We introduce MGNiceNet, a unified approach that uses a linked kernel formulation for panoptic segmentation and self-supervised depth estimation. MGNiceNet is based on the state-of-the-art real-time panoptic segmentation method RT-K-Net and extends the architecture to cover both panoptic segmentation and self-supervised monocular depth estimation. To this end, we introduce a tightly coupled self-supervised depth estimation predictor that explicitly uses information from the panoptic path for depth prediction. Furthermore, we introduce a panoptic-guided motion masking method to improve depth estimation without relying on video panoptic segmentation annotations. We evaluate our method on two popular autonomous driving datasets, Cityscapes and KITTI. Our model shows state-of-the-art results compared to other real-time methods and closes the gap to computationally more demanding methods. Source code and trained models are available at https://github.com/markusschoen/MGNiceNet.
翻译:单目几何场景理解结合全景分割与自监督深度估计,专注于自动驾驶领域的实时应用。本文提出MGNiceNet——一种采用链接核公式统一处理全景分割与自监督深度估计的方法。该模型基于当前最先进的实时全景分割方法RT-K-Net,将其架构扩展至同时覆盖全景分割与自监督单目深度估计任务。为此,我们引入了紧密耦合的自监督深度估计预测器,该组件显式利用全景分支的信息进行深度预测。此外,我们提出全景引导运动掩码方法,在不依赖视频全景分割标注的情况下提升深度估计精度。我们在Cityscapes和KITTI两个主流自动驾驶数据集上评估了所提方法。相较于其他实时方法,本模型取得了最先进的性能表现,并缩小了与计算需求更高方法之间的性能差距。源代码与训练模型已发布于https://github.com/markusschoen/MGNiceNet。