This work proposes a hybrid unsupervised/supervised learning method to pretrain models applied in earth observation downstream tasks where only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pretrain models with a pretext task to predict spatially coarse elevation maps which are commonly available worldwide. The intuition behind is that there is generally some correlation between the elevation and targets in many remote sensing tasks, allowing the model to pre-learn useful representations. We assess the performance of our approach on a segmentation downstream task on labels gathering many possible subclasses (pixel level classification of farmlands vs. other) and an image binary classification task derived from the former, on a dataset on the north-east of Colombia. On both cases we pretrain our models with 39K unlabeled images, fine tune the downstream task only with 80 labeled images and test it with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation and SimCLR+Elevation for classification, outperform their counterparts without the elevation pretext task in terms of accuracy and macro-average F1, which supports the notion that including additional information correlated to targets in downstream tasks can lead to improved performance.
翻译:本文提出一种混合无监督/有监督学习方法,用于预训练在地球观测下游任务中应用的模型,该场景下仅能获取少量标注了极宽泛语义概念的标签。我们将对比学习预训练方法与一项前置任务相结合,该任务旨在预测全球广泛可用的空间粗粒度高程图。其直觉依据在于:许多遥感任务中的高程与目标之间普遍存在某种相关性,这使得模型能够预学习有用的表征。我们在哥伦比亚东北部数据集上,针对两类下游任务评估了方法的性能:一是汇集众多可能子类标签的分割任务(农田与其他地物的像素级分类),二是由此衍生出的图像二分类任务。在两种情况下,我们均使用39K张无标签图像预训练模型,仅用80张标注图像微调下游任务,并用2944张标注图像进行测试。实验表明,我们的方法——用于分割的GLCNet+Elevation和用于分类的SimCLR+Elevation——在准确率和宏平均F1指标上均优于未采用高程前置任务的对应方法。这支持了以下观点:在下游任务中纳入与目标相关的额外信息可提升模型性能。