FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.

翻译：在自动驾驶领域，三维占据预测相较于传统感知任务（如三维目标检测和鸟瞰图语义分割），能够输出体素级状态和语义标签，从而更全面地理解三维场景。近期研究者已从视角转换技术、真值标签生成和精细网络设计等多个维度对该任务进行了深入探索，旨在获得更优性能。然而，在自动驾驶车辆中实际运行至关重要的推理速度却被忽视了。为此，本文提出一种名为FastOcc的新方法。通过系统分析输入图像分辨率、图像骨干网络、视角转换和占据预测头四个组成部分的网络效果与延迟，发现占据预测头在保持精度的同时具有显著加速潜力。针对这一组件的优化，我们采用新颖的残差式架构替代耗时三维卷积网络，该架构主要通过轻量级二维BEV卷积网络处理特征，并通过融合从原始图像特征插值得到的三维体素特征进行补偿。在Occ3D-nuScenes基准上的实验表明，我们的FastOcc在实现快速推理速度的同时，达到了业界领先水平。

相关内容

Networking

关注 0

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日