Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.
翻译:协变量偏移在实际中普遍存在,即源数据和目标数据的输入分布存在显著差异。尽管该问题在各类学习任务中具有重要实践意义,但现有方法大多仅针对特定学习任务,且缺乏充分的理论与数值验证。为解决这一问题,我们提出在再生核希尔伯特空间(RKHS)中针对协变量偏移下一般非参数方法的统一分析框架。我们的理论结果适用于属于丰富损失函数族的一般损失函数,该函数族涵盖许多常用方法作为特例,例如均值回归、分位数回归、基于似然的分类以及基于间隔的分类。本文重点研究两类协变量偏移问题,并针对一般损失函数建立了尖锐收敛速率,从而提供了统一的理论分析,该结果与文献中使用平方损失时获得的最优结果一致。基于合成数据和真实数据的广泛数值研究证实了我们的理论发现,并进一步展示了所提方法的有效性。