3D semantic occupancy prediction enables autonomous vehicles (AVs) to perceive fine-grained geometric and semantic structure of their surroundings from onboard sensors, which is essential for safe decision-making and navigation. Recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, the intermediate representations used by existing methods for 3D semantic occupancy prediction rely heavily on 3D voxel volumes or a set of 3D Gaussians, hindering the model's ability to efficiently and effectively capture fine-grained geometric details in the 3D driving environment. This paper introduces TFusionOcc, a novel object-centric multi-sensor fusion framework for predicting 3D semantic occupancy. By leveraging multi-stage multi-sensor fusion, Student's t-distribution, and the T-Mixture model (TMM), together with more geometrically flexible primitives, such as the deformable superquadric (superquadric with inverse warp), the proposed method achieved state-of-the-art (SOTA) performance on the nuScenes benchmark. In addition, extensive experiments were conducted on the nuScenes-C dataset to demonstrate the robustness of the proposed method in different camera and lidar corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc
翻译:三维语义占据预测使自动驾驶车辆能够通过车载传感器感知周围环境的细粒度几何和语义结构,这对于安全决策和导航至关重要。近年来,用于三维语义占据预测的模型已成功应对了描述现实世界中不同形状和类别物体的挑战。然而,现有方法用于三维语义占据预测的中间表示严重依赖于三维体素体积或一组三维高斯分布,这限制了模型高效且有效地捕捉三维驾驶环境中细粒度几何细节的能力。本文提出了TFusionOcc,一种新颖的对象中心多传感器融合框架,用于预测三维语义占据。通过利用多阶段多传感器融合、学生t分布、T混合模型以及更具几何灵活性的基元(如可变形超二次曲面),所提方法在nuScenes基准测试中取得了最先进的性能。此外,在nuScenes-C数据集上进行了大量实验,以证明所提方法在不同相机和激光雷达损坏场景下的鲁棒性。代码将在以下地址公开:https://github.com/DanielMing123/TFusionOcc