Semi-supervised 3D object detection (SS3DOD) aims to reduce costly 3D annotations utilizing unlabeled data. Recent studies adopt pseudo-label-based teacher-student frameworks and demonstrate impressive performance. The main challenge of these frameworks is in selecting high-quality pseudo-labels from the teacher's predictions. Most previous methods, however, select pseudo-labels by comparing confidence scores over thresholds manually set. The latest works tackle the challenge either by dynamic thresholding or refining the quality of pseudo-labels. Such methods still overlook contextual information e.g. object distances, classes, and learning states, and inadequately assess the pseudo-label quality using partial information available from the networks. In this work, we propose a novel SS3DOD framework featuring a learnable pseudo-labeling module designed to automatically and adaptively select high-quality pseudo-labels. Our approach introduces two networks at the teacher output level. These networks reliably assess the quality of pseudo-labels by the score fusion and determine context-adaptive thresholds, which are supervised by the alignment of pseudo-labels over GT bounding boxes. Additionally, we introduce a soft supervision strategy that can learn robustly under pseudo-label noises. This helps the student network prioritize cleaner labels over noisy ones in semi-supervised learning. Extensive experiments on the KITTI and Waymo datasets demonstrate the effectiveness of our method. The proposed method selects high-precision pseudo-labels while maintaining a wider coverage of contexts and a higher recall rate, significantly improving relevant SS3DOD methods.
翻译:半监督三维目标检测旨在利用未标注数据减少昂贵的三维标注成本。近期研究采用基于伪标签的师生框架,并展现出卓越性能。这些框架的主要挑战在于从教师模型的预测中选择高质量的伪标签。然而,大多数先前方法通过将置信度分数与人工设定的阈值进行比较来选择伪标签。最新研究通过动态阈值调整或提升伪标签质量来应对这一挑战。此类方法仍忽视了上下文信息(如目标距离、类别及学习状态),且未能充分利用网络中的可用信息来充分评估伪标签质量。本文提出一种新型半监督三维目标检测框架,其核心为可学习的伪标签模块,旨在自动且自适应地选择高质量伪标签。我们的方法在教师模型输出层引入两个网络,通过分数融合可靠地评估伪标签质量,并确定上下文自适应阈值,该阈值通过伪标签与真实标注边界框的对齐关系进行监督。此外,我们提出一种软监督策略,能够在伪标签噪声干扰下实现鲁棒学习,帮助学生网络在半监督学习中优先学习更干净的标签而非噪声标签。在KITTI和Waymo数据集上的大量实验证明了本方法的有效性。所提方法在保持更广泛上下文覆盖和更高召回率的同时,能够选择高精度伪标签,显著提升了相关半监督三维目标检测方法的性能。