Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at https://streamable.com/ge4v51.
翻译:摘要:语义分割是实现场景理解的有效方法。近年来,三维鸟瞰图空间中的分割因其可直接用于驾驶策略而逐渐流行。然而,针对商用车辆中常用的环视鱼眼相机的鸟瞰图分割研究仍十分有限。由于该任务缺乏真实世界公开数据集,且现有合成数据集无法处理因遮挡导致的非模态区域,我们使用Cognata模拟器创建了一个包含多种道路类型、天气和光照条件的合成数据集。我们将鸟瞰图分割泛化至适用于任意相机模型,这对于混合使用不同相机场景尤为实用。我们通过圆柱面校正处理鱼眼图像,并采用标准基于LSS的鸟瞰图分割模型作为基线方案。实验证明,在不进行去畸变处理的情况下可获得更优性能——去畸变会导致预处理耗时增加、视场角缩小及重采样伪影等负面影响。此外,我们提出一种更适用于鱼眼相机的畸变感知可学习鸟瞰图池化策略。通过引入遮挡推理模块对模型进行扩展,该模块对鸟瞰空间中的估计至关重要。DaF-BEVSeg的定性性能演示视频见https://streamable.com/ge4v51。