Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360{\deg} panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in a top-down view. Instead of relying on narrow-FoV image sequences, a panoramic image with depth information is sufficient to generate a holistic BEV semantic map. To benchmark 360BEV, we present two indoor datasets, 360BEV-Matterport and 360BEV-Stanford, both of which include egocentric panoramic images and semantic segmentation labels, as well as allocentric semantic maps. Besides delving deep into different mapping paradigms, we propose a dedicated solution for panoramic semantic mapping, namely 360Mapper. Through extensive experiments, our methods achieve 44.32% and 45.78% in mIoU on both datasets respectively, surpassing previous counterparts with gains of +7.60% and +9.70% in mIoU. Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html.
翻译:单视角观测仅能触及场景的局部,难以洞悉全局。鸟瞰图感知(Bird's-Eye-View, BEV)旨在从自我中心视角获取空间绝对坐标映射,然而窄视场(Field of View, FoV)的使用严重制约了其性能。本文首次建立了从360°全景图像到BEV语义的映射任务(即360BEV),实现了室内场景自顶向下的整体表征。与依赖窄视场图像序列的方法不同,单张带有深度信息的全景图像即可生成完整的BEV语义地图。为构建基准数据集,我们提出了两个室内数据集:360BEV-Matterport与360BEV-Stanford,二者均包含自我中心视角的全景图像及其语义分割标签,以及绝对空间语义地图。在深入探索不同映射范式的基础上,我们提出了面向全景语义映射的专用解决方案——360Mapper。通过大量实验,本方法在两个数据集上的平均交并比(mIoU)分别达到44.32%和45.78%,较现有最佳方法分别提升7.60%和9.70%。代码与数据集已发布于项目页面:https://jamycheung.github.io/360BEV.html。