With the current ubiquity of deep learning methods to solve computer vision and remote sensing specific tasks, the need for labelled data is growing constantly. However, in many cases, the annotation process can be long and tedious depending on the expertise needed to perform reliable annotations. In order to alleviate this need for annotations, several self-supervised methods have recently been proposed in the literature. The core principle behind these methods is to learn an image encoder using solely unlabelled data samples. In earth observation, there are opportunities to exploit domain-specific remote sensing image data in order to improve these methods. Specifically, by leveraging the geographical position associated with each image, it is possible to cross reference a location captured from multiple sensors, leading to multiple views of the same locations. In this paper, we briefly review the core principles behind so-called joint-embeddings methods and investigate the usage of multiple remote sensing modalities in self-supervised pre-training. We evaluate the final performance of the resulting encoders on the task of methane source classification.
翻译:随着深度学习在计算机视觉和遥感特定任务中的普遍应用,对标注数据的需求持续增长。然而在许多情况下,由于需要特定专业知识才能进行可靠标注,标注过程往往耗时且繁琐。为缓解对标注的需求,近年来文献中提出了多种自监督方法。这些方法的核心原理是仅利用未标注数据样本学习图像编码器。在地球观测领域,可借助领域特定的遥感影像数据来改进这些方法。具体而言,通过利用每张图像关联的地理位置信息,能够对多传感器获取的同一位置进行交叉引用,从而生成同一地点的多视角数据。本文简要回顾了联合嵌入方法的核心原理,并探讨了在自监督预训练中融合多种遥感模态的可行性。我们通过甲烷源分类任务评估了所得编码器的最终性能。