Recent endeavors have been made to leverage self-supervised depth estimation as guidance in unsupervised domain adaptation (UDA) for semantic segmentation. Prior arts, however, overlook the discrepancy between semantic and depth features, as well as the reliability of feature fusion, thus leading to suboptimal segmentation performance. To address this issue, we propose a novel UDA framework called SMART (croSs doMain semAntic segmentation based on eneRgy esTimation) that utilizes Energy-Based Models (EBMs) to obtain task-adaptive features and achieve reliable feature fusion for semantic segmentation with self-supervised depth estimates. Our framework incorporates two novel components: energy-based feature fusion (EB2F) and energy-based reliable fusion Assessment (RFA) modules. The EB2F module produces task-adaptive semantic and depth features by explicitly measuring and reducing their discrepancy using Hopfield energy for better feature fusion. The RFA module evaluates the reliability of the feature fusion using an energy score to improve the effectiveness of depth guidance. Extensive experiments on two datasets demonstrate that our method achieves significant performance gains over prior works, validating the effectiveness of our energy-based learning approach.
翻译:近期研究尝试利用自监督深度估计作为无监督域自适应(UDA)语义分割的引导信息。然而,现有方法忽视了语义特征与深度特征之间的差异以及特征融合的可靠性,导致分割性能欠佳。为此,我们提出一种名为SMART(基于能量估计的跨域语义分割)的新型UDA框架,该框架利用基于能量的模型(EBMs)获取任务自适应特征,并实现基于自监督深度估计的可靠特征融合。该框架包含两个创新模块:基于能量的特征融合(EB2F)模块和基于能量的可靠性评估(RFA)模块。EB2F模块通过利用Hopfield能量显式度量并减小语义与深度特征的差异,生成任务自适应特征以实现更优融合。RFA模块则通过能量分数评估特征融合的可靠性,以增强深度引导的有效性。在两个数据集上的大量实验表明,本方法相较于现有工作取得了显著性能提升,验证了基于能量的学习方法的有效性。