Accurate urban maps provide essential information to support sustainable urban development. Recent urban mapping methods use multi-modal deep neural networks to fuse Synthetic Aperture Radar (SAR) and optical data. However, multi-modal networks may rely on just one modality due to the greedy nature of learning. In turn, the imbalanced utilization of modalities can negatively affect the generalization ability of a network. In this paper, we investigate the utilization of SAR and optical data for urban mapping. To that end, a dual-branch network architecture using intermediate fusion modules to share information between the uni-modal branches is utilized. A cut-off mechanism in the fusion modules enables the stopping of information flow between the branches, which is used to estimate the network's dependence on SAR and optical data. While our experiments on the SEN12 Global Urban Mapping dataset show that good performance can be achieved with conventional SAR-optical data fusion (F1 score = 0.682 $\pm$ 0.014), we also observed a clear under-utilization of optical data. Therefore, future work is required to investigate whether a more balanced utilization of SAR and optical data can lead to performance improvements.
翻译:精确的城市地图为可持续城市发展提供了关键信息支撑。现有城市制图方法采用多模态深度神经网络融合合成孔径雷达(SAR)与光学数据。然而,由于学习的贪婪特性,多模态网络可能仅依赖单一模态。模态利用失衡会继而影响网络的泛化能力。本文针对城市制图中SAR与光学数据的利用情况展开研究。为此,采用基于中间融合模块的双分支网络架构,在单模态分支间实现信息共享。融合模块中的截断机制能够阻隔分支间的信息流,从而评估网络对SAR与光学数据的依赖性。基于SEN12全球城市制图数据集的实验表明:传统SAR-光学数据融合虽能取得良好性能(F1分数=0.682±0.014),但光学数据存在明显的利用不足现象。因此,后续研究需探究是否可通过更均衡的SAR与光学数据利用来提升性能。