Aligning Validation with Deployment in Spatial Prediction: Target-Weighted Cross-Validation

Reliable estimation of predictive performance is essential for spatial environmental modeling, where machine-learning models are used to generate maps from unevenly distributed observations. Standard cross-validation (CV) assumes that validation data are representative of prediction conditions across the target domain. In practice, this assumption is often violated due to preferential or clustered sampling, leading to biased performance and uncertainty estimates. We introduce a deployment-oriented validation framework based on weighted CV that aligns validation tasks with the distribution of prediction tasks across a specified domain. The framework includes importance-weighted cross-validation (IWCV) and a calibration-based approach, Target-Weighted Cross-Validation (TWCV), which uses spatially meaningful task descriptors such as environmental covariates and prediction distance. Simulation experiments show that conventional non-spatial and spatial CV strategies can exhibit substantial bias under realistic sampling designs, whereas weighted CV approaches substantially reduce this bias when validation tasks adequately cover the deployment-task space. A case study on mapping nitrogen dioxide (NO$_2$) concentrations across Germany demonstrates that standard CV can overestimate prediction error due to sampling bias, while weighted CV yields estimates more consistent with deployment conditions. The framework separates validation task generation from risk estimation and provides a practical approach for improving performance assessment in spatial prediction settings where sample distributions differ from prediction domains.

翻译：可靠的预测性能估计对空间环境建模至关重要，此类模型利用机器学习技术基于非均匀分布的观测生成地图。标准交叉验证假设验证数据能够代表目标域中的预测条件，然而在实际应用中，由于偏好性采样或聚类采样，该假设常被违反，导致性能估计与不确定性估计出现偏差。本文提出一种基于加权交叉验证的面向部署验证框架，通过将验证任务与指定域内的预测任务分布对齐，实现性能评估的优化。该框架包含重要性加权交叉验证及基于校准的目标加权交叉验证方法，后者采用环境协变量、预测距离等具有空间意义的任务描述符。模拟实验表明，在现实采样设计下，传统非空间与空间交叉验证策略可能产生显著偏差，而加权交叉验证方法在验证任务充分覆盖部署任务空间时可大幅降低该偏差。基于德国二氧化氮浓度填图案例研究发现，标准交叉验证因采样偏差可能高估预测误差，而加权交叉验证产生的估计更符合部署条件。该框架将验证任务生成与风险估计解耦，为样本分布与预测域不一致的空间预测场景提供了一种改进性能评估的实用方法。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

专知会员服务

24+阅读 · 2025年8月14日

【剑桥博士论文】利用深度学习方法预测与建模空间天气

专知会员服务

15+阅读 · 2025年5月2日

基于深度学习的空中目标威胁评估技术研究

专知会员服务

44+阅读 · 2025年3月25日

军事信息系统情境计算需求一致性验证研究

专知会员服务

34+阅读 · 2024年3月16日