3D object detection aims to recover the 3D information of concerning objects and serves as the fundamental task of autonomous driving perception. Its performance greatly depends on the scale of labeled training data, yet it is costly to obtain high-quality annotations for point cloud data. While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for training, the structural nature of 3D point cloud data facilitates the composition of objects and backgrounds to synthesize realistic scenes. Motivated by this, we propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes to improve the generalization of the detection models. We obtain pseudo-labels for unlabeled objects and generate diverse scenes with different compositions of objects and backgrounds. As the scene synthesis is sensitive to the quality of pseudo-labels, we further propose a hardness-aware strategy to reduce the effect of low-quality pseudo-labels and maintain a dynamic pseudo-database to ensure the diversity and quality of synthetic scenes. Extensive experimental results on the widely used KITTI and Waymo datasets demonstrate the superiority of the proposed HASS method, which outperforms existing semi-supervised learning methods on 3D object detection. Code: https://github.com/wzzheng/HASS.
翻译:三维目标检测旨在恢复相关物体的三维信息,是自动驾驶感知的基础任务。其性能很大程度上依赖于标注训练数据的规模,然而获取点云数据的高质量标注成本高昂。传统方法侧重于为未标注样本生成伪标签作为训练的补充,而三维点云数据的结构特性便于将物体与背景组合以合成逼真场景。受此启发,我们提出一种基于难易度感知的场景合成(HASS)方法,通过生成自适应合成场景来提升检测模型的泛化能力。我们为未标注物体获取伪标签,并生成具有不同物体与背景组合的多样化场景。由于场景合成对伪标签质量敏感,我们进一步提出一种难易度感知策略,以降低低质量伪标签的影响,并维护动态伪标签数据库以确保合成场景的多样性与质量。在广泛使用的KITTI和Waymo数据集上的大量实验结果证明了所提出的HASS方法的优越性,其在三维目标检测任务上超越了现有的半监督学习方法。代码:https://github.com/wzzheng/HASS。