Covariate shift is a common transfer learning scenario where the marginal distributions of input variables vary between source and target data while the conditional distribution of the output variable remains consistent. The existing notions describing differences between marginal distributions face limitations in handling scenarios with unbounded support, particularly when the target distribution has a heavier tail. To overcome these challenges, we introduce a new concept called density ratio exponent to quantify the relative decay rates of marginal distributions' tails under covariate shift. Furthermore, we propose the local k-nearest neighbour regressor for transfer learning, which adapts the number of nearest neighbours based on the marginal likelihood of each test sample. From a theoretical perspective, convergence rates with and without supervision information on the target domain are established. Those rates indicate that our estimator achieves faster convergence rates when the density ratio exponent satisfies certain conditions, highlighting the benefits of using density estimation for determining different numbers of nearest neighbours for each test sample. Our contributions enhance the understanding and applicability of transfer learning under covariate shift, especially in scenarios with unbounded support and heavy-tailed distributions.
翻译:协变量偏移是一种常见的迁移学习场景,其中源数据与目标数据中输入变量的边缘分布存在差异,而输出变量的条件分布保持一致。现有描述边缘分布差异的概念在处理无界支持场景时面临局限性,尤其是当目标分布具有更重尾部时。为克服这些挑战,我们引入了称为密度比指数的新概念,用以量化协变量偏移下边缘分布尾部的相对衰减速率。此外,我们提出了用于迁移学习的局部k-最近邻回归器,该方法根据每个测试样本的边缘似然自适应调整最近邻数量。在理论层面,我们建立了在目标域有/无监督信息两种情况下的收敛速率。这些速率表明,当密度比指数满足特定条件时,我们的估计器实现了更快的收敛速度,凸显了利用密度估计为每个测试样本确定不同最近邻数量的优势。我们的贡献增强了在协变量偏移下(特别是无界支持与重尾分布场景中)对迁移学习的理解及其适用性。