Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.
翻译:语义场景补全旨在从相机或激光雷达数据中推断具有语义类别的三维几何结构,为自动驾驶提供关键的占据信息。先前的研究主要集中于以全监督方式构建网络或基准数据集。然而,密集的占据网格需要逐点的语义标注,这会带来昂贵且繁琐的标注成本。本文构建了一个新的高效标注基准,命名为ScribbleSC,其中将基于稀疏涂鸦的语义标签与稠密的几何标签相结合,用于语义场景补全。具体而言,我们提出了一种简单而有效的方法Scribble2Scene,以弥合稀疏涂鸦标注与全监督之间的差距。我们的方法包括构建几何感知的自动标注器,以及通过离线到在线蒸馏模块进行在线模型训练以提升性能。在SemanticKITTI上的实验表明,Scribble2Scene取得了与全监督方法相竞争的性能,在仅标注13.5%体素的情况下,达到了全监督模型99%的性能。ScribbleSC的标注数据及完整实现代码已公开于https://github.com/songw-zju/Scribble2Scene。