Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field.
翻译:单目场景理解是自动驾驶系统的基础组成部分。在单目感知任务体系中,语义场景补全(SSC)作为实现整体三维场景理解的关键实用任务,能够从RGB输入中联合补全语义信息与几何细节。然而,当前SSC研究(尤其是大规模街景场景)的发展受限于高质量数据集的稀缺性。为解决这一问题,我们提出了SSCBench——一个整合了多个主流自动驾驶数据集(如KITTI-360、nuScenes和Waymo)场景的综合基准。SSCBench遵循学界既定的设置与格式,便于在不同街景环境中便捷地探索SSC方法。我们通过单目、三目及点云输入对模型进行基准测试,以评估传感器覆盖范围与模态差异导致的性能差距。此外,我们统一了跨数据集的语义标签体系,以简化跨域泛化测试。我们将持续纳入更多数据集与SSC模型,以推动该领域的进一步发展。