Importance weighting is a standard tool for correcting distribution shift, but its statistical behavior under target shift -- where the label distribution changes between training and testing while the conditional distribution of inputs given the label remains stable -- remains under-explored. We analyze importance-weighted kernel ridge regression under target shift and show that, because the weights depend only on the output variable, reweighting corrects the train-test mismatch without altering the input-space complexity that governs kernel generalization. Under standard RKHS regularity and capacity conditions and a mild Bernstein-type moment condition on the label weights, we obtain finite-sample guarantees showing that the estimator achieves the same convergence behavior as in the no-shift case, with shift severity affecting only the constants through weight moments. We complement these results with matching minimax lower bounds, establishing rate optimality and quantifying the unavoidable dependence on shift severity. We further study more general weighting schemes and prove that weight misspecification induces an irreducible bias: the estimator concentrates around an induced population regression function that generally differs from the desired test regression function unless the weights are accurate. Finally, we derive consequences for plug-in classification under target shift via standard calibration arguments.
翻译:重要性加权是校正分布偏移的标准工具,但其在目标偏移下的统计行为——即标签分布在训练与测试间发生变化,而给定标签的输入条件分布保持稳定——仍未得到充分探索。我们分析了目标偏移下重要性加权的核岭回归,并证明:由于权重仅依赖于输出变量,重加权能够校正训练-测试失配,而不改变控制核泛化能力的输入空间复杂度。在标准的再生核希尔伯特空间正则性与容量条件,以及标签权重满足温和的伯恩斯坦型矩条件下,我们获得了有限样本保证,表明该估计器实现了与无偏移情形相同的收敛行为,偏移严重程度仅通过权重矩影响常数项。我们以匹配的极小极大下界补充了这些结果,确立了速率最优性并量化了对偏移严重程度的不可避免的依赖性。我们进一步研究了更一般的加权方案,并证明权重设定错误会导致不可约偏差:估计器会集中在一个诱导的总体回归函数附近,该函数通常与期望的测试回归函数不同,除非权重设定准确。最后,通过标准校准论证,我们推导了目标偏移下基于插件的分类方法的相关结论。