Current Earth observation foundation models are architecturally rigid, struggle with heterogeneous sensors and are constrained to fixed patch sizes. This limits their deployment in real-world scenarios requiring flexible computeaccuracy trade-offs. We propose THOR, a "computeadaptive" foundation model that solves both input heterogeneity and deployment rigidity. THOR is the first architecture to unify data from Copernicus Sentinel-1, -2, and -3 (OLCI & SLSTR) satellites, processing their native 10 m to 1000 m resolutions in a single model. We pre-train THOR with a novel randomized patch and input image size strategy. This allows a single set of pre-trained weights to be deployed at inference with any patch size, enabling a dynamic trade-off between computational cost and feature resolution without retraining. We pre-train THOR on THOR Pretrain, a new, large-scale multi-sensor dataset and demonstrate state-of-the-art performance on downstream benchmarks, particularly in data-limited regimes like the PANGAEA 10% split, validating that THOR's flexible feature generation excels for diverse climate and society applications.
翻译:当前的地球观测基础模型存在架构僵化、难以处理异构传感器数据且受限于固定图像块尺寸的问题,这限制了其在需要灵活权衡计算成本与精度的实际场景中的部署。我们提出THOR,一种“计算自适应”的基础模型,可同时解决输入异构性与部署僵化性两大挑战。THOR是首个能够统一处理哥白尼计划Sentinel-1、-2与-3(OLCI和SLSTR)卫星数据的架构,可在单一模型中处理其原生10米至1000米分辨率的影像。我们采用一种新颖的随机化图像块与输入图像尺寸策略对THOR进行预训练。这使得单一组预训练权重能够在推理时以任意图像块尺寸部署,实现计算成本与特征分辨率之间的动态权衡而无需重新训练。我们在THOR Pretrain——一个新构建的大规模多传感器数据集上预训练THOR,并在下游基准测试中展示了最先进的性能,特别是在数据受限场景(如PANGAEA 10%数据划分)中表现优异,验证了THOR的灵活特征生成能力能够出色服务于多样化的气候与社会应用。