SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present SatlasPretrain, a remote sensing dataset that is large in both breadth and scale, combining Sentinel-2 and NAIP images with 302M labels under 137 categories and seven label types. We evaluate eight baselines and a proposed method on SatlasPretrain, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. Moreover, we find that pre-training on SatlasPretrain substantially improves performance on downstream tasks, increasing average accuracy by 18% over ImageNet and 6% over the next best baseline. The dataset, pre-trained model weights, and code are available at https://satlas-pretrain.allen.ai/.

翻译：遥感图像在多种行星监测应用中具有重要价值，从追踪森林砍伐到打击非法捕捞。地球具有极高的多样性——遥感图像中潜在任务的数量巨大，目标的尺度范围从数公里到仅数十厘米。然而，构建可泛化的计算机视觉方法面临挑战，部分原因在于缺乏能够为多个任务捕捉这些多样化特征的大规模数据集。本文提出SatlasPretrain遥感数据集，其在广度和规模上均具有优势，融合了Sentinel-2与NAIP图像，涵盖137个类别和七种标注类型共3.02亿个标签。我们在SatlasPretrain上评估了八种基线方法及一种新提出的方法，发现在解决遥感特有的研究挑战方面仍有显著改进空间，包括处理由不同类型传感器图像组成的图像时间序列，以及利用长距离空间上下文信息。此外，我们发现基于SatlasPretrain的预训练能显著提升下游任务性能，相较于ImageNet平均准确率提升18%，相较于次优基线提升6%。数据集、预训练模型权重及代码已开源至https://satlas-pretrain.allen.ai/。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日