Vision foundation models are a new frontier in Geospatial Artificial Intelligence (GeoAI), an interdisciplinary research area that applies and extends AI for geospatial problem solving and geographic knowledge discovery, because of their potential to enable powerful image analysis by learning and extracting important image features from vast amounts of geospatial data. This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. This model is compared with convolutional neural network and vision transformer-based architectures in terms of mapping accuracy for flooded areas. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated based on both a test dataset and a dataset that is completely unseen by the model. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions. The findings also indicate areas for improvement for the Prithvi model in terms of adopting multi-scale representation learning, developing more end-to-end pipelines for high-level image analysis tasks, and offering more flexibility in terms of input data bands.
翻译:视觉基础模型是地理空间人工智能(GeoAI)领域的新前沿——这一交叉研究领域致力于应用并拓展人工智能以解决地理空间问题并发现地理知识——因为它们能够通过从海量地理空间数据中学习并提取重要的图像特征,实现强大的图像分析能力。本文评估了首个地理空间基础模型——IBM-NASA的Prithvi——在支持关键地理空间分析任务(洪水淹没制图)中的性能。将该模型与基于卷积神经网络和视觉Transformer的架构在淹没区域制图精度方面进行了比较。实验采用基准数据集Sen1Floods11,并基于测试数据集和模型完全未见过的数据集评估了模型的可预测性、泛化能力和迁移能力。结果表明,Prithvi模型具有良好的迁移能力,在分割未见区域的淹没区域方面展现出性能优势。研究结果还指出了Prithvi模型在以下方面的改进方向:采用多尺度表征学习、开发更适用于高级图像分析任务的端到端流程,以及提供更灵活的输入数据波段选择。