Biotic communities vary continuously across space, yet biome maps impose categorical boundaries that compress this variation, particularly at ecotones where transitional communities are ecologically distinct. Could Earth observation (EO) foundation models, which encode spectral, spatial, and temporal information with dense embeddings, convert discrete biome maps into continuous representations that better capture ecological variation? Here, we fit a linear classifier on Clay v1.5 satellite image embeddings to predict biome labels from a categorical map. The softmax output yields a continuous probability vector whose dimensions correspond to named biome classes. We evaluate this approach using six Brazilian biomes, 1.3 million embeddings, and 10,015 withheld forest inventory plots spanning 4,672 plant species. The continuous biome representation outperforms discrete biome labels for predicting species occurrence (mean per-species AUC 0.618 vs. 0.570 across 10 spatial cross-validation folds). Decomposing this gain shows that continuity in the graded probability output, rather than label reassignment, accounts for the improvement; the pattern holds across all distances from biome boundaries. The raw 1024-dimensional embedding remains the strongest predictor we tested (mean AUC 0.646 vs. 0.618), but the continuous representation recovers most of the embedding's gain over discrete labels. This simple approach provides a probabilistic replacement for categorical map labels, preserving their meaning while encoding graded variation that discrete maps suppress.
翻译:生物群落随空间连续变化,然而生物群系地图通过设定类别边界压缩了这种变异性,尤其在生态过渡带中,过渡性群落具有独特的生态特征。地球观测基础模型(如编码光谱、时空信息的密集嵌入)能否将离散的生物群系地图转化为连续表征,从而更好地捕捉生态变异性?本文在Clay v1.5卫星图像嵌入上拟合线性分类器,基于类别地图预测生物群系标签。Softmax输出生成连续概率向量,其维度对应命名生物群系类别。我们利用巴西亚马逊的六个生物群系、130万个嵌入以及涵盖4672种植物物种的10015个保留森林样地数据进行评估。连续生物群系表征在预测物种出现方面优于离散生物群系标签(10次空间交叉验证中,每物种平均AUC值0.618 vs 0.570)。分解这一增益发现,分级概率输出的连续性(而非标签重分配)是性能提升的原因;该模式在距生物群系边界的所有距离上均成立。原始1024维嵌入仍是我们测试的最强预测因子(平均AUC 0.646 vs 0.618),但连续表征恢复了嵌入相对于离散标签的大部分增益。这种简单方法为类别地图标签提供了一种概率替代方案,既保留其含义,又编码了离散地图所抑制的分级变异。