This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applications since important perception tasks, such as semantic segmentation, become very challenging within the blind zone. Previous works considered the out-FoV outpainting and in-FoV segmentation separately. However, we observe that these two tasks are actually closely coupled. To jointly estimate the tightly intertwined complete fisheye image and scene semantics, we introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation while considering different polar distributions. In addition to the contribution of the novel task and architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to facilitate training and evaluation of this new track. Our experiments demonstrate that the proposed FishDreamer outperforms methods solving each task in isolation and surpasses alternative approaches on the Fisheye Semantic Completion. Code and datasets will be available at https://github.com/MasterHow/FishDreamer.
翻译:本文提出鱼眼语义补全(Fisheye Semantic Completion, FSC)这一新任务,旨在即使超出传感器视场角(FoV)的情况下,也能推断鱼眼图像的密集纹理、结构和语义。鱼眼相机相比普通针孔相机具有更大的视场角,但其独特的特殊成像模型自然导致图像平面边缘存在盲区。这对于安全关键型应用而言并不理想,因为诸如语义分割等重要感知任务在盲区内变得极具挑战性。先前的工作分别考虑了FoV外图像外推和FoV内分割,然而,我们发现这两个任务实际上紧密耦合。为了联合估计紧密交织的完整鱼眼图像和场景语义,我们引入了新型FishDreamer,其核心在于成功利用视觉Transformer(ViT),并增强以新颖的极坐标感知交叉注意力模块(Polar-aware Cross Attention module, PCA),在考虑不同极坐标分布的同时,充分利用密集上下文并引导语义一致的内容生成。除新任务和新架构的贡献外,我们还构建了Cityscapes-BF和KITTI360-BF数据集,以促进这一新方向的训练与评估。实验表明,所提出的FishDreamer优于分别独立解决各任务的方法,并在鱼眼语义补全上超越了其他替代方案。代码与数据集将发布于https://github.com/MasterHow/FishDreamer。