The task of estimating 3D occupancy from surrounding-view images is an exciting development in the field of autonomous driving, following the success of Bird's Eye View (BEV) perception. This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. In this work, we present a simple framework for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation, such as network design, optimization, and evaluation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation and 3D reconstruction, which could advance the study of 3D perception in autonomous driving. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish the benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets and achieve competitive performance. The relevant code will be updated in https://github.com/GANWANSHUI/SimpleOccupancy.
翻译:从环视图像中估计3D占据信息的任务,是继鸟瞰视图(BEV)感知成功后自动驾驶领域的激动人心的发展。该任务提供驾驶环境的关键3D属性,增强对周围空间的整体理解与感知。本文中,我们提出了一种用于3D占据估计的简洁框架,该框架基于CNN,旨在揭示网络设计、优化与评估等若干关键因素。此外,我们探索了3D占据估计与其他相关任务(如单目深度估计和3D重建)之间的关系,这有望推动自动驾驶中3D感知的研究。在评估方面,我们提出了一种简单的采样策略来定义占据评估指标,该策略灵活适用于当前公开数据集。同时,我们基于深度估计指标建立了基准,在DDAD和NuScenes数据集上将其与单目深度估计方法进行了比较,并取得了具有竞争力的性能。相关代码将更新至https://github.com/GANWANSHUI/SimpleOccupancy。