As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360^{\circ}$ data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original $360^{\circ}$ data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.
翻译:作为计算机视觉中一项重要且具有挑战性的问题,全景语义分割(PASS)基于超宽视角实现了完整的场景感知。通常,现有以二维全景图像为输入的PASS方法主要关注图像畸变矫正,但缺乏对原始360°数据三维特性的考量。因此,当输入存在三维扰动的全景图像时,其性能会显著下降。为提升对三维扰动的鲁棒性,我们提出面向全景语义分割的球面几何感知Transformer(SGAT4PASS),该模型融合了三维球面几何知识。具体而言,我们为PASS设计了一个球面几何感知框架,包含三个模块:球面几何感知图像投影、球形变形块嵌入及全景感知损失函数。其中,球面几何感知投影模块考虑了存在三维扰动的输入图像,球形变形块嵌入模块在现有变形块嵌入中增加了球面几何约束,而全景感知损失函数则表征了原始360°数据的像素密度。在Stanford2D3D全景数据集上的实验结果表明,SGAT4PASS显著提升了性能与鲁棒性,平均交并比(mIoU)提升约2%,且当数据存在小幅度三维扰动时,性能稳定性提升一个数量级。本方法代码及补充材料详见 https://github.com/TencentARC/SGAT4PASS。