In the realm of autonomous vehicle (AV) perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Semantic scene completion (SSC) aims to infer scene geometry and semantics from limited observations. While camera-based SSC has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware camera-based 3D semantic scene completion method ($\alpha$-SSC). Our approach includes an uncertainty propagation framework from depth models (Depth-UP) to enhance geometry completion (up to 11.58% improvement) and semantic segmentation (up to 14.61% improvement). Additionally, we propose a hierarchical conformal prediction (HCP) method to quantify SSC uncertainty, effectively addressing high-level class imbalance in SSC datasets. On the geometry level, we present a novel KL divergence-based score function that significantly improves the occupied recall of safety-critical classes (45% improvement) with minimal performance overhead (3.4% reduction). For uncertainty quantification, we demonstrate the ability to achieve smaller prediction set sizes while maintaining a defined coverage guarantee. Compared with baselines, it achieves up to 85% reduction in set sizes. Our contributions collectively signify significant advancements in SSC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems.
翻译:在自动驾驶感知领域,理解三维场景对于规划与建图等任务至关重要。语义场景补全旨在从有限的观测中推断场景的几何结构与语义信息。基于相机的语义场景补全方法因其成本低廉且能提供丰富的视觉线索而日益普及,但现有方法往往忽略了模型固有的不确定性。为解决这一问题,我们提出了一种基于相机的不确定性感知三维语义场景补全方法(α-SSC)。我们的方法包含一个从深度模型传播不确定性的框架(Depth-UP),以提升几何补全(最高提升11.58%)和语义分割(最高提升14.61%)的性能。此外,我们提出了一种分层保形预测方法,用于量化语义场景补全中的不确定性,有效解决了语义场景补全数据集中存在的高层级类别不平衡问题。在几何层面,我们提出了一种基于KL散度的新型评分函数,在性能开销极小(仅降低3.4%)的情况下,显著提升了安全关键类别的占用召回率(提升45%)。对于不确定性量化,我们证明了该方法能够在保持既定覆盖保证的同时,实现更小的预测集尺寸。与基线方法相比,其预测集尺寸最高可减少85%。我们的贡献共同标志着语义场景补全在精度与鲁棒性方面取得了显著进展,是自动驾驶感知系统向前迈出的重要一步。