This paper addresses an interesting yet challenging problem -- source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation -- given only a pinhole image-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is nontrivial due to the semantic mismatches, style discrepancies, and inevitable distortion of panoramic images. To this end, we propose a novel method that utilizes Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) with a fixed FoV to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, the distinct projection discrepancies between source and target domains impede the direct knowledge transfer; thus, we propose a panoramic prototype adaptation module (PPAM) to integrate panoramic prototypes from the extracted knowledge for adaptation. We then impose the loss constraints on both predictions and prototypes and propose a cross-dual attention module (CDAM) at the feature level to better align the spatial and channel characteristics across the domains and projections. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our method achieves significantly better performance than prior SFUDA methods for pinhole-to-panoramic adaptation.
翻译:本文针对一个有趣且具有挑战性的问题——仅基于针孔图像训练模型(源域)与无标签全景图像(目标域)的源自由无监督域适应(SFUDA)任务,实现针孔图像到全景图像的语义分割。解决该问题的难点在于全景图像存在语义不匹配、风格差异及不可避免的畸变。为此,我们提出一种新颖方法:利用切线投影(TP)减少畸变,同时采用固定视场角切割等距柱状投影(ERP)以模拟针孔图像。两种投影在提取源模型知识方面均展现出有效性。然而,源域与目标域之间的显著投影差异阻碍了知识的直接迁移;因此,我们提出全景原型自适应模块(PPAM),从提取的知识中整合全景原型以实现域适应。随后,我们对预测结果与原型施加损失约束,并引入跨域双重注意力模块(CDAM)在特征层面对齐跨域与跨投影的空间及通道特征。知识提取与迁移过程同步更新以达成最优性能。在包含室外与室内场景的合成与真实世界基准上的大量实验表明,对于针孔到全景的域适应任务,本方法性能显著优于现有SFUDA方法。