Foundation Models for Generalist Geospatial Artificial Intelligence

Johannes Jakubik,Sujit Roy,C. E. Phillips,Paolo Fraccaro,Denys Godwin,Bianca Zadrozny,Daniela Szwarcman,Carlos Gomes,Gabby Nyirjesy,Blair Edwards,Daiki Kimura,Naomi Simumba,Linsong Chu,S. Karthik Mukkavilli,Devyani Lambhate,Kamal Das,Ranjini Bangalore,Dario Oliveira,Michal Muszynski,Kumar Ankur,Muthukumaran Ramasubramanian,Iksha Gurung,Sam Khallaghi, Hanxi, Li,Michael Cecil,Maryam Ahmadi,Fatemeh Kordi,Hamed Alemohammad,Manil Maskey,Raghu Ganti,Kommy Weldemariam,Rahul Ramachandran

Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.

翻译：高度可适应和可复用的人工智能模型研发取得了显著进展，预计将对地球科学与遥感领域产生深远影响。基础模型通过自监督方式在大规模无标注数据集上预训练，随后利用少量标注数据针对各类下游任务进行微调。本文首次提出了一个专为大规模地理空间数据高效预训练与微调设计的框架。利用该框架，我们构建了基于Transformer的地理空间基础模型Prithvi，该模型基于超过1TB的多光谱卫星影像（源于Harmonized Landsat-Sentinel 2 (HLS)数据集）进行预训练。研究表明，该框架能够成功将Prithvi微调至多项地球观测任务，包括多时相云隙填补、洪水制图、野火疤痕分割以及多时相作物分割——这些任务此前未被基础模型相关研究涉及。实验证明，相比随机初始化权重，预训练模型可加速微调过程。此外，预训练的Prithvi模型与现有最优方法相比表现优异，例如在多时相云隙填补任务中，其结构相似性指数比条件生成对抗网络（conditional GAN）模型高出最多5个百分点（或5.7%）。鉴于地球观测领域标注数据的稀缺性，我们逐步减少用于模型微调的可用标注数据量以评估数据效率，结果表明可在不影响模型精度的前提下大幅降低数据需求量。该预训练模型（参数量达1亿）及其对应的微调工作流已通过Hugging Face平台以开源形式公开发布，为全球地球科学社区提供贡献。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日