Transfer learning enhances prediction accuracy on a target distribution by leveraging data from a source distribution, demonstrating significant benefits in various applications. This paper introduces a novel dissimilarity measure that utilizes vicinity information, i.e., the local structure of data points, to analyze the excess error in classification under covariate shift, a transfer learning setting where marginal feature distributions differ but conditional label distributions remain the same. We characterize the excess error using the proposed measure and demonstrate faster or competitive convergence rates compared to previous techniques. Notably, our approach is effective in the support non-containment assumption, which often appears in real-world applications, holds. Our theoretical analysis bridges the gap between current theoretical findings and empirical observations in transfer learning, particularly in scenarios with significant differences between source and target distributions.
翻译:迁移学习通过利用源分布的数据来提升目标分布上的预测精度,在各种应用中展现出显著优势。本文提出了一种新颖的相异性度量方法,该方法利用邻域信息(即数据点的局部结构)来分析协变量偏移下分类的泛化误差——这是一种边际特征分布不同但条件标签分布保持不变的迁移学习场景。我们使用所提出的度量方法刻画了泛化误差,并证明了相较于现有技术,我们的方法具有更快或具有竞争力的收敛速率。值得注意的是,我们的方法在现实应用中常见的支撑集非包含假设下依然有效。我们的理论分析弥合了当前迁移学习理论发现与实证观察之间的差距,特别是在源分布与目标分布存在显著差异的场景中。