Earth embedding models transform Earth observation data into embeddings uniquely tied to locations on the Earth's surface. These models are typically evaluated in isolation, comparing the downstream task performance across different Earth embeddings. However, spatially aligned embeddings can naturally be fused, providing richer information per location, a capability that isolated evaluations fail to capture. We therefore propose assessing Earth embeddings by their complementarity: the performance gain of fused embeddings over the best single-model baseline. To operationalise this, we introduce an embedding complementarity index applicable to any embedding and task, and evaluate four Earth embedding models (AlphaEarth, Tessera, GeoCLIP, SatCLIP) in isolation, in all pairs, and jointly across six downstream tasks. Fused embeddings outperform the best single model in four out of six tasks, confirming that single-embedding evaluations often underestimate Earth embedding capabilities. Complementarity proves both task- and location-dependent. Further, for a land cover regression task, we find that complementarity is partially determined by the spatial scale of land cover classes. Complementarity reframes Earth embeddings: the greatest future gains may come not from any single Earth embedding model, but from combinations that are better together.
翻译:地球嵌入模型将地球观测数据转换为与地球表面位置唯一关联的嵌入表示。这类模型通常在孤立条件下进行评估,通过跨不同地球嵌入的下游任务性能进行对比。然而,空间对齐的嵌入可以自然地融合,为每个位置提供更丰富的信息——这是孤立评估无法捕捉的能力。为此,我们提出通过互补性来评估地球嵌入:即融合嵌入相对于最优单模型基线的性能增益。为实现这一评估,我们引入适用于任意嵌入及任务的嵌入互补性指数,在孤立场景、所有两两组合及联合场景下,对四种地球嵌入模型(AlphaEarth, Tessera, GeoCLIP, SatCLIP)在六项下游任务中进行评估。结果显示,融合嵌入在六项任务中的四项超越最优单模型,证实单嵌入评估往往低估地球嵌入的潜力。互补性呈现出任务依赖性与位置依赖性。此外,针对土地覆盖回归任务,我们发现互补性部分由土地覆盖类别的空间尺度决定。互补性重新定义了地球嵌入的范式:未来最大的性能提升可能并非来自单一地球嵌入模型,而是来自协同增效的组合。