PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models

Valerio Marsocci,Yuru Jia,Georges Le Bellier,David Kerekes,Liang Zeng,Sebastian Hafner,Sebastian Gerard,Eric Brune,Ritu Yadav,Ali Shibli,Heng Fang,Yifang Ban,Maarten Vergauwen,Nicolas Audebert,Andrea Nascetti

Geospatial Foundation Models (GFMs) have emerged as powerful tools for extracting representations from Earth observation data, but their evaluation remains inconsistent and narrow. Existing works often evaluate on suboptimal downstream datasets and tasks, that are often too easy or too narrow, limiting the usefulness of the evaluations to assess the real-world applicability of GFMs. Additionally, there is a distinct lack of diversity in current evaluation protocols, which fail to account for the multiplicity of image resolutions, sensor types, and temporalities, which further complicates the assessment of GFM performance. In particular, most existing benchmarks are geographically biased towards North America and Europe, questioning the global applicability of GFMs. To overcome these challenges, we introduce PANGAEA, a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities. It establishes a robust and widely applicable benchmark for GFMs. We evaluate the most popular GFMs openly available on this benchmark and analyze their performance across several domains. In particular, we compare these models to supervised baselines (e.g. UNet and vanilla ViT), and assess their effectiveness when faced with limited labeled data. Our findings highlight the limitations of GFMs, under different scenarios, showing that they do not consistently outperform supervised models. PANGAEA is designed to be highly extensible, allowing for the seamless inclusion of new datasets, models, and tasks in future research. By releasing the evaluation code and benchmark, we aim to enable other researchers to replicate our experiments and build upon our work, fostering a more principled evaluation protocol for large pre-trained geospatial models. The code is available at https://github.com/VMarsocci/pangaea-bench.

翻译：地理空间基础模型已成为从地球观测数据中提取表征的强大工具，但其评估方法仍存在不一致性和局限性。现有研究通常在次优的下游数据集和任务上进行评估，这些任务往往过于简单或范围狭窄，限制了评估结果在衡量GFM实际适用性方面的效用。此外，当前评估方案明显缺乏多样性，未能涵盖多尺度图像分辨率、传感器类型和时间动态特征，这进一步增加了评估GFM性能的复杂性。特别值得注意的是，现有基准大多存在地理偏见，过度侧重于北美和欧洲地区，这引发了对GFM全球适用性的质疑。为应对这些挑战，我们提出了PANGAEA——一个覆盖多样化数据集、任务类型、分辨率、传感器模态与时间动态的标准化评估框架。该框架为GFM建立了稳健且广泛适用的基准。我们在此基准上评估了当前公开可用的主流GFM，并分析了它们在不同领域中的性能表现。特别地，我们将这些模型与监督式基线模型（如UNet和原始ViT）进行比较，并评估其在标注数据有限情况下的有效性。研究结果揭示了GFM在不同场景下的局限性，表明其性能并未持续优于监督模型。PANGAEA设计具备高度可扩展性，支持未来研究中无缝集成新数据集、模型和任务。通过公开评估代码与基准框架，我们旨在帮助其他研究者复现实验并推进相关工作，从而为大型预训练地理空间模型建立更规范的评估体系。代码已发布于https://github.com/VMarsocci/pangaea-bench。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日