Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.

翻译：交叉验证（CV）常用于在缺乏独立测试数据时估计预测风险，其有效性依赖于验证任务与部署任务具有相同分布这一假设。在空间预测及其他结构化数据场景中，该假设常被违背，导致部署风险估计产生偏差。我们提出目标加权交叉验证（Target-Weighted CV, TWCV），这是一种考虑验证与部署任务分布差异的部署风险估计方法，能同时处理（1）协变量偏移和（2）任务难度偏移。我们通过协变量、空间配置等描述符表征预测任务。TWCV为验证损失分配权重，使加权后的验证任务经验分布与目标域上的对应分布相匹配。权重通过校准加权获得，形成面向部署风险的重要性加权估计量。由于TWCV需要充分覆盖部署分布的支持域，我们将其与空间缓冲重采样结合，以多样化任务难度分布。模拟研究表明：传统及空间估计量会因采样方式产生显著偏差，而缓冲TWCV在各场景下均保持近似无偏。环境污染制图案例进一步证实：验证与部署任务分布差异会影响性能评估，缓冲TWCV能更准确反映目标域上的预测任务。这些结果表明任务分布失配是空间预测中CV偏差的主要来源，同时显示校准加权结合合适的验证任务生成器，为数据分布偏移下预测风险的估计提供了可行方案。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

基于深度学习的伪装目标检测研究进展

专知会员服务

32+阅读 · 2025年4月12日

基于深度学习的空中目标威胁评估技术研究

专知会员服务

44+阅读 · 2025年3月25日

博士论文《用于自动驾驶虚拟测试的雷达模拟验证方法》2023最新135页，斯图加特大学

专知会员服务

24+阅读 · 2023年3月22日

《通过决策分析和多目标优化增强空间域感知地面架构》美空军技术学院19页论文

专知会员服务

37+阅读 · 2023年1月18日