Managing natural resources and mitigating risks from floods, droughts, wildfires, and landslides require models that can accurately predict climate-driven land-surface responses. Traditional models often struggle with spatial generalization because they are trained or calibrated on limited observations and can degrade under concept drift. Recently proposed vision foundation models trained on satellite imagery demand massive compute, and they are not designed for dynamic land surface prediction tasks. We introduce StefaLand, a generative spatiotemporal Earth representation learning model centered on learning cross-domain interactions to suppress overfitting. StefaLand demonstrates especially strong spatial generalization on five datasets across four important tasks: streamflow, soil moisture, soil composition and landslides, compared to previous state-of-the-art methods. The domain-inspired design choices include a location-aware masked autoencoder that fuses static and time-series inputs, an attribute-based rather than image-based representation that drastically reduces compute demands, and residual fine-tuning adapters that strengthen knowledge transfer across tasks. StefaLand can be pretrained and finetuned on commonly available academic compute resources, yet consistently outperforms state-of-the-art supervised learning baselines, fine-tuned vision foundation models and commercially available embeddings, highlighting the previously overlooked value of cross-domain interactions and providing assistance to data-poor regions of the world.
翻译:自然资源管理及洪水、干旱、野火和滑坡等灾害风险缓解,需要能够准确预测气候驱动地表响应的模型。传统模型常因在有限观测数据上训练或校准而难以实现空间泛化,且在概念漂移下性能可能退化。近期提出的基于卫星影像训练的视觉基础模型需要海量计算资源,且并非针对动态地表预测任务设计。本文提出StefaLand——一种以学习跨域交互抑制过拟合为核心的生成式时空地球表征学习模型。在径流量、土壤湿度、土壤成分和滑坡四项重要任务的五个数据集上,StefaLand相较现有最优方法展现出显著的空间泛化优势。其领域启发的设计包括:融合静态与时序输入的位置感知掩码自编码器、大幅降低计算需求的属性基(非图像基)表征,以及增强跨任务知识迁移的残差微调适配器。StefaLand可在常规学术计算资源上完成预训练与微调,但其性能持续超越监督学习基线、微调视觉基础模型及商业嵌入方法,凸显了跨域交互机制长期被忽视的价值,并为全球数据匮乏区域提供了技术支持。