Geospatial models must adapt to the diversity of Earth observation data in terms of resolutions, scales, and modalities. However, existing approaches expect fixed input configurations, which limits their practical applicability. We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders, allowing us to train a single model on highly heterogeneous data in a self-supervised manner. To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of $5$ multimodal datasets with varying characteristics and $11$ distinct sensors. We then train a single powerful model on these diverse datasets simultaneously. Once fine-tuned or probed, we reach state-of-the-art results on the test sets of GeoPlex and for $6$ external datasets across various environment monitoring tasks: land cover mapping, tree species identification, crop type classification, change detection, climate type classification, and segmentation of flood, burn scar, and deforestation. The code and models are available at https://github.com/gastruc/AnySat.
翻译:地理空间模型必须适应地球观测数据在分辨率、尺度与模态上的多样性。然而,现有方法通常要求固定的输入配置,这限制了其实际应用性。我们提出AnySat,一种基于联合嵌入预测架构(JEPA)与尺度自适应空间编码器的多模态模型,使我们能够以自监督方式在高度异质的数据上训练单一模型。为展示这一统一方法的优势,我们构建了GeoPlex——一个包含$5$个具有不同特性的多模态数据集、涉及$11$种不同传感器的数据集合。随后,我们在这些多样化数据集上同步训练了一个统一的强效模型。经过微调或探针适配后,该模型在GeoPlex的测试集以及$6$个外部数据集上,于多种环境监测任务中达到了最先进的性能水平,这些任务包括:土地覆盖制图、树种识别、作物类型分类、变化检测、气候类型分类,以及洪水、火烧迹地与森林砍伐的语义分割。代码与模型已发布于 https://github.com/gastruc/AnySat。