Self-supervised learning through masked autoencoders (MAEs) has recently attracted great attention for remote sensing (RS) image representation learning, and thus embodies a significant potential for content-based image retrieval (CBIR) from ever-growing RS image archives. However, the existing MAE based CBIR studies in RS assume that the considered RS images are acquired by a single image sensor, and thus are only suitable for uni-modal CBIR problems. The effectiveness of MAEs for cross-sensor CBIR, which aims to search semantically similar images across different image modalities, has not been explored yet. In this paper, we take the first step to explore the effectiveness of MAEs for sensor-agnostic CBIR in RS. To this end, we present a systematic overview on the possible adaptations of the vanilla MAE to exploit masked image modeling on multi-sensor RS image archives (denoted as cross-sensor masked autoencoders [CSMAEs]) in the context of CBIR. Based on different adjustments applied to the vanilla MAE, we introduce different CSMAE models. We also provide an extensive experimental analysis of these CSMAE models. We finally derive a guideline to exploit masked image modeling for uni-modal and cross-modal CBIR problems in RS. The code of this work is publicly available at https://github.com/jakhac/CSMAE.
翻译:通过掩码自编码器(MAE)的自监督学习在遥感图像表征学习领域近期受到广泛关注,因而对从日益增长的遥感图像档案库中进行基于内容的图像检索(CBIR)具有重要潜力。然而,现有基于MAE的遥感CBIR研究均假设所处理的遥感图像由单一传感器获取,因此仅适用于单模态CBIR问题。MAE在跨传感器CBIR(旨在跨不同图像模态检索语义相似图像)中的有效性尚未得到探索。本文首次探索MAE在遥感传感器无关CBIR中的有效性。为此,我们系统性地概述了将原始MAE应用于多传感器遥感图像档案库(称为跨传感器掩码自编码器[CSMAE])进行掩码图像建模的可能适配方案。基于对原始MAE的不同调整策略,我们提出了多种CSMAE模型,并对其进行了全面的实验分析。最终我们提出了利用掩码图像建模解决遥感单模态与跨模态CBIR问题的指导原则。本工作的代码已公开于https://github.com/jakhac/CSMAE。