Applying pseudo labeling techniques has been found to be advantageous in semi-supervised 3D object detection (SSOD) in Bird's-Eye-View (BEV) for autonomous driving, particularly where labeled data is limited. In the literature, Exponential Moving Average (EMA) has been used for adjustments of the weights of teacher network by the student network. However, the same induces catastrophic forgetting in the teacher network. In this work, we address this issue by introducing a novel concept of Reflective Teacher where the student is trained by both labeled and pseudo labeled data while its knowledge is progressively passed to the teacher through a regularizer to ensure retention of previous knowledge. Additionally, we propose Geometry Aware BEV Fusion (GA-BEVFusion) for efficient alignment of multi-modal BEV features, thus reducing the disparity between the modalities - camera and LiDAR. This helps to map the precise geometric information embedded among LiDAR points reliably with the spatial priors for extraction of semantic information from camera images. Our experiments on the nuScenes and Waymo datasets demonstrate: 1) improved performance over state-of-the-art methods in both fully supervised and semi-supervised settings; 2) Reflective Teacher achieves equivalent performance with only 25% and 22% of labeled data for nuScenes and Waymo datasets respectively, in contrast to other fully supervised methods that utilize the full labeled dataset.
翻译:应用伪标签技术已被证明在自动驾驶鸟瞰图(BEV)半监督三维目标检测(SSOD)中具有优势,特别是在标注数据有限的情况下。现有文献中,指数移动平均(EMA)已被用于通过学生网络调整教师网络的权重。然而,这种方法会导致教师网络发生灾难性遗忘。本文通过引入一种新颖的“反射式教师”概念来解决这一问题:学生网络通过标注数据和伪标注数据进行训练,同时其知识通过正则化器逐步传递给教师网络,以确保保留先验知识。此外,我们提出几何感知的BEV融合(GA-BEVFusion)方法,用于高效对齐多模态BEV特征,从而减少相机与激光雷达模态间的差异。这有助于将激光雷达点云中嵌入的精确几何信息与空间先验可靠地映射,以从相机图像中提取语义信息。我们在nuScenes和Waymo数据集上的实验表明:1)在完全监督和半监督设置下,本方法性能均优于现有先进方法;2)反射式教师仅需分别使用nuScenes和Waymo数据集25%和22%的标注数据,即可达到其他完全监督方法使用全量标注数据集的同等性能。