Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled perspective images using publicly available semantic segmentation models trained on large street-view datasets. PCT applies a perspective view task head to the image encoder shared with the BEV segmentation head, effectively utilizing the unlabeled data to be trained with the generated pseudo-labels. Since image encoders are present in nearly all camera-based BEV segmentation architectures, PCT is flexible and applicable to various existing BEV architectures. PCT can be applied to various settings where unlabeled data is available. In this paper, we applied PCT for semi-supervised learning (SSL) and unsupervised domain adaptation (UDA). Additionally, we introduce strong input perturbation through Camera Dropout (CamDrop) and feature perturbation via BEV Feature Dropout (BFD), which are crucial for enhancing SSL capabilities using our teacher-student framework. Our comprehensive approach is simple and flexible but yields significant improvements over various baselines for SSL and UDA, achieving competitive performances even against the current state-of-the-art.
翻译:为鸟瞰图(BEV)分割生成标注因场景复杂性及高昂的人工标注成本而面临重大挑战。本研究通过利用大量可用的未标注数据应对这些挑战。我们提出透视线索训练(PCT)框架——一种创新的训练框架,该框架利用基于大规模街景数据集训练的公开语义分割模型,从未标注的透视图像生成伪标签进行训练。PCT在与BEV分割头共享的图像编码器上应用透视视图任务头,有效利用未标注数据配合生成的伪标签进行训练。由于图像编码器几乎存在于所有基于相机的BEV分割架构中,PCT具备灵活性,可适配多种现有BEV架构。该框架可应用于存在未标注数据的各类场景。本文中,我们将PCT应用于半监督学习(SSL)和无监督域适应(UDA)任务。此外,我们通过相机随机丢弃(CamDrop)引入强输入扰动,并通过BEV特征随机丢弃(BFD)实施特征扰动,这些技术对增强基于师生框架的SSL能力至关重要。我们的综合方案简洁灵活,却在SSL和UDA任务上较多种基线方法取得显著提升,即使与当前最优方法相比也展现出具有竞争力的性能。