Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle-from problem definition to model deployment with feedback-is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
翻译:现代机器学习的最新发展与研究已显著推动了地理空间领域的进步。尽管已有大量深度学习架构与模型被提出,但其中绝大多数仅在缺乏实际应用相关性的基准数据集上开发。此外,许多方法在这些数据集上的性能已趋于饱和。我们认为,从以模型为中心的视角转向互补的数据中心化视角,对于进一步提升精度、泛化能力以及对终端用户应用的实际影响力至关重要。同时,考虑完整的机器学习周期——从问题定义到包含反馈的模型部署——对于增强机器学习模型在未知场景下的可靠性具有关键意义。本文提出了面向地理空间数据的自动化数据中心化学习方法的定义、精确分类与综述,并阐明了数据中心化学习在更广泛的机器学习部署周期中相对于模型中心化方法的互补作用。我们系统回顾了地理空间领域的相关文献,并将其划分为不同类别。通过一组代表性实验展示了具体实施案例,这些案例为运用数据中心化机器学习方法处理地理空间数据提供了可操作的实施路径。