Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field.
翻译:单目场景理解是自主系统的基础组成部分。在单目感知主题的范畴内,语义场景补全(SSC)是实现整体3D场景理解的关键且实用的任务,该任务从RGB输入中联合补全语义信息与几何细节。然而,SSC的进展,尤其是在大规模街道场景中,受到高质量数据集稀缺的制约。为解决这一问题,我们提出了SSCBench,一个集成了多种主流自动驾驶数据集(如KITTI-360、nuScenes和Waymo)中场景的综合基准。SSCBench遵循社区中已有的设置和格式,便于在不同街道场景中轻松探索SSC方法。我们使用单目、三目和点云输入对模型进行基准测试,以评估传感器覆盖范围和模态差异带来的性能差距。此外,我们统一了不同数据集的语义标签,以简化跨域泛化测试。我们致力于纳入更多数据集和SSC模型,以推动该领域的进一步发展。