The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy
翻译:从环视图像中估计三维占据率是自动驾驶领域继鸟瞰图感知成功后的一项令人振奋的进展。该任务提供了驾驶环境的关键三维属性,增强了对周围空间的整体理解与感知能力。然而,目前仍缺乏定义该任务的基线方法,包括网络设计、优化策略及评估体系。本文提出一种基于卷积神经网络框架的简易三维占据估计方法,旨在揭示该任务中的若干关键因素。此外,我们探索了三维占据估计与其他相关任务(如单目深度估计、立体匹配及鸟瞰图感知中的三维目标检测与地图分割)之间的关联,以推动三维占据估计研究。在评估方面,我们提出一种简单的采样策略定义占据评估指标,该策略对当前公开数据集具有灵活性。进一步地,我们基于深度估计指标建立了新基准,并在DDAD及Nuscenes数据集上将所提方法与单目深度估计算法进行对比。相关代码将开源至https://github.com/GANWANSHUI/SimpleOccupancy。