TRACE: Theoretical Risk Attribution under Covariate-shift Effects

When a source-trained model $Q$ is replaced by a model $\tilde{Q}$ trained on shifted data, its performance on the source domain can change unpredictably. To address this, we study the two-model risk change, $ΔR := R_P(\tilde{Q}) - R_P(Q)$, under covariate shift. We introduce TRACE (Theoretical Risk Attribution under Covariate-shift Effects), a framework that decomposes $|ΔR|$ into an interpretable upper bound. This decomposition disentangles the risk change into four actionable factors: two generalization gaps, a model change penalty, and a covariate shift penalty, transforming the bound into a powerful diagnostic tool for understanding why performance has changed. To make TRACE a fully computable diagnostic, we instantiate each term. The covariate shift penalty is estimated via a model sensitivity factor (from high-quantile input gradients) and a data-shift measure; we use feature-space Optimal Transport (OT) by default and provide a robust alternative using Maximum Mean Discrepancy (MMD). The model change penalty is controlled by the average output distance between the two models on the target sample. Generalization gaps are estimated on held-out data. We validate our framework in an idealized linear regression setting, showing the TRACE bound correctly captures the scaling of the true risk difference with the magnitude of the shift. Across synthetic and vision benchmarks, TRACE diagnostics are valid and maintain a strong monotonic relationship with the true performance degradation. Crucially, we derive a deployment gate score that correlates strongly with $|ΔR|$ and achieves high AUROC/AUPRC for gating decisions, enabling safe, label-efficient model replacement.

翻译：当源域训练的模型 $Q$ 被在偏移数据上训练的模型 $\tilde{Q}$ 替换时，其在源域上的性能可能发生不可预测的变化。为解决此问题，我们研究了协变量偏移下的双模型风险变化 $ΔR := R_P(\tilde{Q}) - R_P(Q)$。我们提出了 TRACE（协变量偏移效应下的理论风险归因）框架，该框架将 $|ΔR|$ 分解为一个可解释的上界。此分解将风险变化解耦为四个可操作的因子：两个泛化差距、一个模型变更惩罚项和一个协变量偏移惩罚项，从而将该上界转化为一个强大的诊断工具，用于理解性能变化的原因。为使 TRACE 成为一个完全可计算的诊断工具，我们对每一项进行了实例化。协变量偏移惩罚项通过模型敏感度因子（源自高分位数输入梯度）和一个数据偏移度量进行估计；我们默认使用特征空间最优传输（OT），并提供了使用最大均值差异（MMD）的鲁棒替代方案。模型变更惩罚项由两个模型在目标样本上的平均输出距离控制。泛化差距在留出数据上进行估计。我们在理想化的线性回归设置中验证了我们的框架，表明 TRACE 上界正确地捕捉了真实风险差异随偏移幅度变化的缩放关系。在合成和视觉基准测试中，TRACE 诊断是有效的，并与真实的性能退化保持强单调关系。至关重要的是，我们推导出一个部署门控分数，该分数与 $|ΔR|$ 强相关，并在门控决策中实现了高 AUROC/AUPRC，从而实现了安全、标签高效的模型替换。