Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.
翻译:高度可适应和可复用的人工智能模型研发取得了显著进展,预计将对地球科学与遥感领域产生深远影响。基础模型通过自监督方式在大规模无标注数据集上预训练,随后利用少量标注数据针对各类下游任务进行微调。本文首次提出了一个专为大规模地理空间数据高效预训练与微调设计的框架。利用该框架,我们构建了基于Transformer的地理空间基础模型Prithvi,该模型基于超过1TB的多光谱卫星影像(源于Harmonized Landsat-Sentinel 2 (HLS)数据集)进行预训练。研究表明,该框架能够成功将Prithvi微调至多项地球观测任务,包括多时相云隙填补、洪水制图、野火疤痕分割以及多时相作物分割——这些任务此前未被基础模型相关研究涉及。实验证明,相比随机初始化权重,预训练模型可加速微调过程。此外,预训练的Prithvi模型与现有最优方法相比表现优异,例如在多时相云隙填补任务中,其结构相似性指数比条件生成对抗网络(conditional GAN)模型高出最多5个百分点(或5.7%)。鉴于地球观测领域标注数据的稀缺性,我们逐步减少用于模型微调的可用标注数据量以评估数据效率,结果表明可在不影响模型精度的前提下大幅降低数据需求量。该预训练模型(参数量达1亿)及其对应的微调工作流已通过Hugging Face平台以开源形式公开发布,为全球地球科学社区提供贡献。