Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame. Finally, we use the transformed features for downstream tasks such as object detection and semantic segmentation. Existing semantic segmentation methods do also suffer from an hallucination problem as they do not take visibility information into account. This hallucination can be particularly problematic for subsequent modules such as control and planning. To mitigate the issue, our method provides depth uncertainty and reliable visibility-aware estimations. We further leverage our parametric depth modeling to present a novel visibility-aware evaluation metric that, when taken into account, can mitigate the hallucination problem. Extensive experiments on object detection and semantic segmentation on the nuScenes datasets demonstrate that our method outperforms existing methods on both tasks.

翻译：近期，面向自动驾驶的纯视觉感知模型通过将多视角图像特征编码至鸟瞰视角（BEV）空间取得了令人瞩目的成果。这些方法的关键步骤及主要瓶颈在于将图像特征转换至BEV坐标系。本文聚焦于利用几何信息（如深度）来建模此类特征变换。现有工作或采用非参数化深度分布建模导致显著的内存消耗，或忽略几何信息以规避该问题。相比之下，我们提出使用参数化深度分布建模进行特征变换。首先，通过为每个视图中的每个像素预测参数化深度分布，将二维图像特征提升至自车定义的3D空间。随后，基于深度导出的3D空间占据关系，将3D特征体聚合至BEV坐标系。最后，将变换后的特征用于目标检测与语义分割等下游任务。现有语义分割方法因未考虑可见性信息而存在幻觉问题，该问题可能对控制与规划等后续模块造成严重干扰。为缓解该问题，我们的方法提供了深度不确定性与可靠的可见性感知估计。此外，我们进一步利用参数化深度建模提出一种新型可见性感知评估指标，该指标在应用时可有效缓解幻觉问题。在nuScenes数据集上的目标检测与语义分割实验表明，我们的方法在两项任务上均优于现有方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/