We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of likelihood ratios apart from an upper bound on them. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where the likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax rate-optimal, up to logarithmic factors.
翻译:我们研究了再生核希尔伯特空间(RKHS)中非参数回归背景下的协变量偏移问题。我们聚焦于通过源分布与目标分布之间的似然比定义的两类自然协变量偏移问题。当似然比一致有界时,我们证明:对于具有正则核特征值的大类RKHS,采用精心选择正则化参数的核岭回归(KRR)估计量在极小化极大意义下达到最优率(至多相差一个对数因子)。值得注意的是,KRR仅需知道似然比的上界而无需其全部信息。与无协变量偏移的标准统计情形形成鲜明对比,我们进一步证明:在协变量偏移下,直接最小化函数类经验风险的朴素估计量严格劣于KRR。随后我们处理似然比可能无界但具有有限二阶矩的更广协变量偏移问题。针对此情形,我们提出一种基于似然比精细截断的样本加权重权重KRR估计量。我们再次证明该估计量在极小化极大意义下达到最优率(至多相差对数因子)。