Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.
翻译:精确的三维目标检测对于自动驾驶车辆和机器人安全有效地导航及与环境交互至关重要。然而,三维检测器的性能依赖于数据规模和标注,而标注成本高昂。因此,在有限标注数据下进行训练的需求日益增长。我们探索了一种采用通道增强的新型师生框架,用于三维半监督目标检测。典型的师生半监督学习方法通常对教师网络采用弱增强,对学生网络采用强增强。在本工作中,我们利用变换等变性检测器(TED)对两个网络同时应用多通道增强。TED使我们能够探索点云上不同的增强组合,并高效聚合多通道变换等变性特征。原则上,通过对教师网络采用固定的通道增强,学生网络可以在可靠的伪标签上稳定训练。采用强通道增强可以丰富数据的多样性,提升对变换的鲁棒性,并增强学生网络的泛化性能。我们以最先进的分层监督方法作为基线,并将其双阈值机制适配到TED中,称为通道交并比一致性。我们在KITTI数据集上评估了所提方法,取得了显著的性能提升,超越了当前最先进的三维半监督目标检测模型。