Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present SatlasPretrain, a remote sensing dataset that is large in both breadth and scale, combining Sentinel-2 and NAIP images with 302M labels under 137 categories and seven label types. We evaluate eight baselines and a proposed method on SatlasPretrain, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. Moreover, we find that pre-training on SatlasPretrain substantially improves performance on downstream tasks, increasing average accuracy by 18% over ImageNet and 6% over the next best baseline. The dataset, pre-trained model weights, and code are available at https://satlas-pretrain.allen.ai/.
翻译:遥感图像在多种行星监测应用中具有重要价值,从追踪森林砍伐到打击非法捕捞。地球具有极高的多样性——遥感图像中潜在任务的数量巨大,目标的尺度范围从数公里到仅数十厘米。然而,构建可泛化的计算机视觉方法面临挑战,部分原因在于缺乏能够为多个任务捕捉这些多样化特征的大规模数据集。本文提出SatlasPretrain遥感数据集,其在广度和规模上均具有优势,融合了Sentinel-2与NAIP图像,涵盖137个类别和七种标注类型共3.02亿个标签。我们在SatlasPretrain上评估了八种基线方法及一种新提出的方法,发现在解决遥感特有的研究挑战方面仍有显著改进空间,包括处理由不同类型传感器图像组成的图像时间序列,以及利用长距离空间上下文信息。此外,我们发现基于SatlasPretrain的预训练能显著提升下游任务性能,相较于ImageNet平均准确率提升18%,相较于次优基线提升6%。数据集、预训练模型权重及代码已开源至https://satlas-pretrain.allen.ai/。