Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning, and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing studies on MAEs in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for uni-modal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this paper, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling on multi-sensor RS image archives (denoted as cross-sensor masked autoencoders [CSMAEs]). Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit masked image modeling for uni-modal and cross-modal CBIR problems in RS. The code of this work is publicly available at https://github.com/jakhac/CSMAE.
翻译:通过掩码自编码器的自监督学习近期在遥感图像表示学习中备受关注,因此对从不断增长的遥感图像存档中进行基于内容的图像检索具有重要潜力。然而,现有关于遥感中掩码自编码器的研究假设所考虑的遥感图像由单一图像传感器采集,因此仅适用于单模态的基于内容的图像检索问题。掩码自编码器在跨传感器基于内容的图像检索(旨在跨不同图像模态搜索语义相似的图像)中的有效性尚未得到探索。本文首次探索了掩码自编码器在遥感中传感器无关的基于内容的图像检索中的有效性。为此,我们系统概述了原始掩码自编码器在利用多传感器遥感图像存档进行掩码图像建模时的可能改编(称为跨传感器掩码自编码器)。基于对原始掩码自编码器的不同调整,我们引入了多种跨传感器掩码自编码器模型,并对这些模型进行了广泛的实验分析。最终,我们推导出在遥感中利用掩码图像建模解决单模态和跨模态基于内容的图像检索问题的指导方针。本工作的代码公开于 https://github.com/jakhac/CSMAE。