Geospatial foundation models (GFMs) have been proposed as generalizable backbones for disaster response, land-cover mapping, food-security monitoring, and other high-stakes Earth-observation tasks. Yet the published work about these models does not give reviewers or users enough information to tell which model fits a given task. We argue that nobody knows what the current state of the art is in geospatial foundation models. The methods may be useful, but the GFM literature does not standardize evaluations, training and testing protocols, released weights, or pretraining controls well enough for anyone to compare or rank them. In a 152-paper audit, we find 46 cross-paper disagreements of at least 10 points for the same model, benchmark, and protocol; 94/126 papers with extractable pretraining data use a configuration no other paper uses; and 39% of GFM papers release no model weights. This lack of community standards can be solved. We propose six concrete expectations: named-license weight release, shared core evaluations, copied-versus-rerun baseline annotations, variance reporting, one shared evaluation harness, and data-vs-architecture-vs-algorithm controls. These gaps are a coordination failure, not a fault of any individual lab; the authors of this paper, like many others in the GFM community, have contributed to them. Rather than just critiquing the community, we aim to provide concrete steps toward a shared understanding of how to innovate GFMs.
翻译:地理空间基础模型(GFMs)被提出作为灾害响应、土地覆盖制图、粮食安全监测及其他高风险地球观测任务中的通用骨干网络。然而,已发表的关于这些模型的研究并未向审稿人或使用者提供足够信息来判定何种模型适用于特定任务。我们认为,目前无人知晓地理空间基础模型领域的最新技术水平。这些方法或许具有实用价值,但GFM文献在评估标准化、训练与测试协议、权重发布机制及预训练控制方面缺乏统一规范,致使任何研究者都无法对模型进行有效比较或排序。在涵盖152篇论文的系统审计中,我们发现在相同模型、基准与协议条件下存在46项跨论文分歧(指标差异至少10个百分点);在可提取预训练数据的126篇论文中,94篇采用其他文献未曾使用的独特配置;且39%的GFM论文未发布任何模型权重。此类社区标准缺失问题有望得到解决。我们提出六项具体规范:具有指定许可证的权重发布、共享核心评估体系、基线标注的复制与复现区分、方差报告制度、统一评估框架,以及数据-架构-算法控制机制。这些问题本质上是协作失败的产物,而非任何单个实验室的过错——本文作者亦如GFM社区众多研究者般,对此负有责任。我们旨在通过具体可行步骤推动形成社区共识,而非仅停留于批判性分析,从而促进GFM领域的创新发展。