Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models.
翻译:视觉占用预测,也称为3D语义场景补全(SSC),是计算机视觉中的一项重大挑战。以往局限于车载处理的方法,难以同时实现几何与语义估计、跨不同视角的连续性以及单视角遮挡下的准确预测。本文提出OccFiner,一种新颖的离车框架,旨在提升基于视觉的占用预测精度。OccFiner通过两个混合阶段运作:1)多对多局部传播网络,该网络隐式对齐并处理多个局部帧,以纠正车载模型误差并持续提升各距离上的占用精度;2)以区域为中心的全局传播阶段,侧重于利用显式多视图几何并结合传感器偏差优化标签,尤其提升远距离被占用体素的精度。大量实验表明,OccFiner能在多种粗糙占用预测类型上同时提升几何与语义精度,在SemanticKITTI数据集上创下最新的最优性能。值得注意的是,OccFiner将基于视觉的SSC模型提升至甚至超越基于激光雷达的车载SSC模型的水平。