This work proposes a hybrid unsupervised/supervised learning method to pretrain models applied in earth observation downstream tasks where only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pretrain models with a pretext task to predict spatially coarse elevation maps which are commonly available worldwide. The intuition behind is that there is generally some correlation between the elevation and targets in many remote sensing tasks, allowing the model to pre-learn useful representations. We assess the performance of our approach on a segmentation downstream task on labels gathering many possible subclasses (pixel level classification of farmlands vs. other) and an image binary classification task derived from the former, on a dataset on the north-east of Colombia. On both cases we pretrain our models with 39K unlabeled images, fine tune the downstream task only with 80 labeled images and test it with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation and SimCLR+Elevation for classification, outperform their counterparts without the elevation pretext task in terms of accuracy and macro-average F1, which supports the notion that including additional information correlated to targets in downstream tasks can lead to improved performance.
翻译:本文提出一种混合无监督/监督学习方法,用于预训练应用于地球观测下游任务的模型,其中仅包含少量表示非常通用语义概念的标签。我们将对比方法与预文本任务相结合,以预测全球范围内普遍可用的空间粗粒度高程地图。其直觉是,在许多遥感任务中,高程与目标之间通常存在某种相关性,从而使模型能够预学有用的表示。我们在哥伦比亚东北部数据集上评估了该方法在两类下游任务中的性能:一种是对包含许多可能子类的标签进行分割(农田与非农田的像素级分类),另一种是基于前者的图像二分类任务。在这两种情况下,我们使用39K张无标签图像预训练模型,仅用80张带标签图像微调下游任务,并使用2944张带标签图像进行测试。实验表明,我们的方法(用于分割的GLCNet+高程和用于分类的SimCLR+高程)在准确率和宏平均F1分数上均优于无高程预文本任务的对应方法,这支持了在下游任务中包含与目标相关的附加信息可提升性能的观点。