As an important and challenging problem in computer vision, Panoramic Semantic Segmentation (PASS) aims to give complete scene perception based on an ultra-wide angle of view. Most PASS methods often focus on spherical geometry with RGB input or using the depth information in original or HHA format, which does not make full use of panoramic image geometry. To address these shortcomings, we propose REL-SF4PASS with our REL depth representation based on cylindrical coordinate and Spherical-dynamic Multi-Modal Fusion SMMF. REL is made up of Rectified Depth, Elevation-Gained Vertical Inclination Angle, and Lateral Orientation Angle, which fully represents 3D space in cylindrical coordinate style and the surface normal direction. SMMF aims to ensure the diversity of fusion for different panoramic image regions and reduce the breakage of cylinder side surface expansion in ERP projection, which uses different fusion strategies to match the different regions in panoramic images. Experimental results show that REL-SF4PASS considerably improves performance and robustness on popular benchmark, Stanford2D3D Panoramic datasets. It gains 2.35% average mIoU improvement on all 3 folds and reduces the performance variance by approximately 70% when facing 3D disturbance.
翻译:全景语义分割(PASS)作为计算机视觉中一个重要且具有挑战性的问题,旨在基于超宽视角提供完整的场景感知。大多数PASS方法通常侧重于处理RGB输入的球面几何,或使用原始格式或HHA格式的深度信息,未能充分利用全景图像的几何特性。为解决这些不足,我们提出了REL-SF4PASS,其核心是基于柱坐标的REL深度表示和球面动态多模态融合(SMMF)。REL由校正深度、高程增益垂直倾角和横向方位角构成,以柱坐标风格完整地表示了三维空间及表面法线方向。SMMF旨在确保不同全景图像区域融合的多样性,并减少ERP投影中柱面侧表面展开带来的断裂问题,它采用不同的融合策略来匹配全景图像中的不同区域。实验结果表明,REL-SF4PASS在主流基准数据集Stanford2D3D Panoramic上显著提升了性能和鲁棒性。在所有3个折叠上平均mIoU提升了2.35%,并且在面对三维扰动时,性能方差降低了约70%。