Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models. Furthermore, OccFiner is the first to achieve automatic annotation of SSC in a purely vision-based approach. Quantitative experiments prove that OccFiner successfully facilitates occupancy data loop-closure in autonomous driving. Additionally, we quantitatively and qualitatively validate the superiority of the offboard approach on city-level SSC static maps. The source code will be made publicly available at https://github.com/MasterHow/OccFiner.

翻译：基于视觉的占用率预测，亦称三维语义场景补全（SSC），是计算机视觉领域的一项重大挑战。先前局限于车载处理的方法，在同时进行几何与语义估计、跨不同视角的连续性以及单视角遮挡处理方面存在困难。本文提出OccFiner，一种新颖的离线框架，旨在提升基于视觉的占用率预测精度。OccFiner在两个混合阶段运行：1）多对多局部传播网络，该网络隐式地对齐并处理多个局部帧，以修正车载模型误差，并持续提升所有距离上的占用率精度。2）以区域为中心的全局传播，侧重于利用显式多视角几何并整合传感器偏差来优化标签，尤其旨在提升远处被占用体素的准确性。大量实验表明，OccFiner在各种类型的粗粒度占用率预测上均提升了几何与语义精度，在SemanticKITTI数据集上取得了新的最优性能。值得注意的是，OccFiner将基于视觉的SSC模型提升至甚至超越基于激光雷达的车载SSC模型的水平。此外，OccFiner首次实现了纯视觉方式的SSC自动标注。定量实验证明，OccFiner成功促进了自动驾驶中的占用率数据闭环。同时，我们在城市级SSC静态地图上定量与定性地验证了离线方法的优越性。源代码将在 https://github.com/MasterHow/OccFiner 公开。