Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response. To help improve the applicability and performance of deep learning models on these geospatial tasks, various works have begun investigating foundation models for this domain. Researchers have explored two prominent approaches for introducing such models in geospatial applications, but both have drawbacks in terms of limited performance benefit or prohibitive training cost. Therefore, in this work, we propose a novel paradigm for building highly effective geospatial foundation models with minimal resource cost and carbon impact. We first construct a compact yet diverse dataset from multiple sources to promote feature diversity, which we term GeoPile. Then, we investigate the potential of continual pretraining from large-scale ImageNet-22k models and propose a multi-objective continual pretraining paradigm, which leverages the strong representations of ImageNet while simultaneously providing the freedom to learn valuable in-domain features. Our approach outperforms previous state-of-the-art geospatial pretraining methods in an extensive evaluation on seven downstream datasets covering various tasks such as change detection, classification, multi-label classification, semantic segmentation, and super-resolution.
翻译:地理空间技术在我们世界中正变得越来越重要,广泛应用于农业、城市规划及灾害响应等领域。为提升深度学习模型在地理空间任务中的适用性与性能,已有研究开始探索该领域的基础模型。学者们研究了两种引入此类模型的主要途径,但两者均存在性能提升有限或训练成本过高的问题。因此,本文提出一种新范式,以最小资源消耗和碳影响构建高效的地理空间基础模型。我们首先从多源数据中构建了一个紧凑而多样化的数据集(称为GeoPile),以促进特征多样性。随后,我们探究了从大规模ImageNet-22k模型进行持续预训练的潜力,并提出了一种多目标持续预训练范式,该范式在利用ImageNet强有力特征表示的同时,赋予模型学习有价值域内特征的自由度。在覆盖变化检测、分类、多标签分类、语义分割及超分辨率等多项任务的七个下游数据集上的广泛评估中,我们的方法优于此前最优的地理空间预训练方法。