Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations. Existing methods typically employ a teacher-student framework with pseudo-labeling to leverage unlabeled point clouds. However, producing reliable pseudo-labels in a diverse 3D space still remains challenging. In this work, we propose Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection. Specifically, we include noises to produce corrupted 3D object size and class label distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs. Moreover, we integrate the diffusion model into the teacher-student framework, so that the denoised bounding boxes can be used to improve pseudo-label generation, as well as the entire semi-supervised learning process. We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods. We also present extensive analysis to understand how our diffusion model design affects performance in semi-supervised learning.
翻译:半监督目标检测对于三维场景理解至关重要,能有效解决大规模三维边界框标注获取受限的问题。现有方法通常采用基于伪标签的师生框架来利用未标注点云数据。然而,在多样化的三维空间中生成可靠的伪标签仍具挑战性。本文提出Diffusion-SS3D这一全新视角,通过扩散模型提升半监督三维目标检测中伪标签的质量。具体而言,我们注入噪声以生成受损的三维目标尺寸与类别标签分布,而后利用扩散模型作为去噪过程获得边界框输出。此外,我们将扩散模型集成至师生框架中,使去噪后的边界框可用于改进伪标签生成及整个半监督学习过程。在ScanNet与SUN RGB-D基准数据集上的实验表明,本方法相较现有方法达到了最优性能。我们还通过大量分析揭示了扩散模型设计对半监督学习性能的影响机制。