MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

from arxiv, Accepted for ECCV 2024. Data and code: https://vishalned.github.io/mmearth Update arXiv v2 (ECCV): 1. Dataset fix: Removed duplicates and corrected ERA5 yearly statistics. 2. Data augmentation fix: Random crops are now aligned. 3. Test metrics fix: Metrics are now overall instead of mini-batch averages, matching GEO-Bench metrics. 4. Pretrained on MMEarth v001 & evaluated on GEO-Bench v1.0

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.

翻译：未标记的地球观测（EO）数据体量巨大，但许多重要应用缺乏标记的训练数据。然而，EO数据提供了独特的机会，能够基于地理位置和时间自动配对来自不同模态和传感器的数据，几乎无需人工成本。我们利用这一机会创建了MMEarth，一个全球尺度的多样化多模态预训练数据集。利用这个包含120万个位置的新语料库，我们提出了一种多预训练任务掩码自编码器（MP-MAE）方法，用于学习光学卫星图像的通用表征。我们的方法基于ConvNeXt V2架构，即一个全卷积掩码自编码器（MAE）。借助一套多模态预训练任务，我们证明了我们的MP-MAE方法在多个下游任务（包括图像分类和语义分割）上，其性能均优于在ImageNet上预训练的MAE以及在特定领域卫星图像上预训练的MAE。我们发现，与仅在光学卫星图像上进行预训练相比，使用多模态预训练任务进行预训练显著提高了线性探测性能。这也带来了更好的标签效率和参数效率，这两者在全球尺度应用中至关重要。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日