GeoAI Reproducibility and Replicability: a computational and spatial perspective

GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of research findings, have rarely been discussed. This paper aims to provide an in-depth analysis of this topic from both computational and spatial perspectives. We first categorize the major goals for reproducing GeoAI research, namely, validation (repeatability), learning and adapting the method for solving a similar or new problem (reproducibility), and examining the generalizability of the research findings (replicability). Each of these goals requires different levels of understanding of GeoAI, as well as different methods to ensure its success. We then discuss the factors that may cause the lack of R&R in GeoAI research, with an emphasis on (1) the selection and use of training data; (2) the uncertainty that resides in the GeoAI model design, training, deployment, and inference processes; and more importantly (3) the inherent spatial heterogeneity of geospatial data and processes. We use a deep learning-based image analysis task as an example to demonstrate the results' uncertainty and spatial variance caused by different factors. The findings reiterate the importance of knowledge sharing, as well as the generation of a "replicability map" that incorporates spatial autocorrelation and spatial heterogeneity into consideration in quantifying the spatial replicability of GeoAI research.

翻译：地理空间人工智能（GeoAI）已成为一个令人兴奋的跨学科研究领域，它将空间理论与数据同前沿AI模型相结合，以新颖的数据驱动方式解决地理空间问题。尽管GeoAI研究在GIScience文献中蓬勃发展，但其再现性与可复制性（R&R）——决定研究成果可重用性、可靠性和科学严谨性的基本原则——却鲜有讨论。本文旨在从计算和空间视角对该主题进行深入分析。我们首先将再现GeoAI研究的主要目标进行分类，即：验证（重复性）、学习和调整方法以解决类似或新问题（再现性），以及检验研究结果的泛化能力（可复制性）。每个目标都需要对GeoAI有不同层次的理解，并采用不同方法确保其成功。随后，我们讨论了可能导致GeoAI研究缺乏R&R的因素，重点关注：(1)训练数据的选择与使用；(2)GeoAI模型设计、训练、部署和推理过程中存在的不确定性；更重要的是(3)地理空间数据与过程固有的空间异质性。我们以基于深度学习的图像分析任务为例，展示了不同因素导致的结果不确定性和空间差异。研究结果重申了知识共享的重要性，并强调了生成"可复制性地图"的必要性，该地图将空间自相关和空间异质性纳入考量，以量化GeoAI研究的空间可复制性。