Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.
翻译:协变量偏移在实际应用中普遍存在,此时源数据与目标数据的输入分布存在显著差异。尽管该问题在各类学习任务中具有重要实践意义,但现有方法大多仅针对特定学习任务,缺乏充分的数值验证与理论支撑。针对这一挑战,我们提出在再生核希尔伯特空间(RKHS)框架下,对协变量偏移场景中的通用非参数方法进行统一分析。本文的理论结果建立在一个涵盖丰富损失函数族的一般性损失函数之上,该损失函数族囊括了诸多常用方法作为特例,包括均值回归、分位数回归、基于似然的分类以及基于间隔的分类。本文重点研究两类协变量偏移问题,并为一般损失函数建立了精确收敛速率,从而提供统一的理论分析框架,该结果与文献中使用平方损失函数时获得的最优结论相吻合。在合成数据与真实数据上的大量数值实验不仅验证了我们的理论发现,更进一步证明了所提方法的有效性。