We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat
翻译:我们提出了一种名为OccFeat的自监督预训练方法,专为纯视觉鸟瞰图(BEV)分割网络设计。通过OccFeat,我们利用占据预测和特征蒸馏任务对BEV网络进行预训练。占据预测为模型提供了场景的三维几何理解,但所学习的几何特征与类别无关。为此,我们通过自监督预训练的图像基础模型进行蒸馏,在三维空间中向模型注入语义信息。实验表明,经本方法预训练的模型在BEV语义分割任务中表现更优,尤其在低数据场景下优势显著。此外,实证结果证实了将特征蒸馏与三维占据预测相结合的有效性。代码仓库:https://github.com/valeoai/Occfeat