We introduce a transfer learning framework for regression that leverages heterogeneous source domains to improve predictive performance in a data-scarce target domain. Our approach learns a conditional generative model separately for each source domain and calibrates the generated responses to the target domain via conditional quantile matching. This distributional alignment step corrects general discrepancies between source and target domains without imposing restrictive assumptions such as covariate or label shift. The resulting framework provides a principled and flexible approach to high-quality data augmentation for downstream learning tasks in the target domain. From a theoretical perspective, we show that an empirical risk minimizer (ERM) trained on the augmented dataset achieves a tighter excess risk bound than the target-only ERM under mild conditions. In particular, we establish new convergence rates for the quantile matching estimator that governs the transfer bias-variance tradeoff. From a practical perspective, extensive simulations and real data applications demonstrate that the proposed method consistently improves prediction accuracy over target-only learning and competing transfer learning methods.
翻译:本文提出一种用于回归任务的迁移学习框架,该框架利用异构源域提升数据稀缺目标域的预测性能。我们的方法为每个源域分别学习条件生成模型,并通过条件分位数匹配将生成的响应校准至目标域。该分布对齐步骤能够修正源域与目标域间的普遍差异,且无需施加协变量偏移或标签偏移等限制性假设。所得框架为目标域下游学习任务提供了一种原理清晰且灵活的高质量数据增强方法。从理论角度,我们证明在温和条件下,基于增强数据集训练的经验风险最小化器(ERM)相比仅使用目标数据的ERM可获得更紧的过量风险界。特别地,我们为控制迁移偏差-方差权衡的分位数匹配估计器建立了新的收敛速率。从实践角度,大量仿真实验与真实数据应用表明,所提方法在预测精度上持续优于仅使用目标数据的学习方法及其他竞争性迁移学习方法。