Transfer learning aims to improve inference in a target domain by leveraging information from related source domains, but its effectiveness critically depends on how cross-domain heterogeneity is modeled and controlled. When the conditional mechanism linking covariates and responses varies across domains, indiscriminate information pooling can lead to negative transfer, degrading performance relative to target-only estimation. We study a multi-source, single-target transfer learning problem under conditional distributional drift and propose a semiparametric domain-varying coefficient model (DVCM), in which domain-relatedness is encoded through an observable domain identifier. This framework generalizes classical varying-coefficient models to structured transfer learning and interpolates between invariant and fully heterogeneous regimes. Building on this model, we develop an adaptive transfer learning estimator that selectively borrows strength from informative source domains while provably safeguarding against negative transfer. Our estimator is computationally efficient and easy to implement; we also show that it is minimax rate-optimal and derive its asymptotic distribution, enabling valid uncertainty quantification and hypothesis testing despite data-adaptive pooling and shrinkage. Our results precisely characterize the interplay among domain heterogeneity, the smoothness of the underlying mean function, and the number of source domains and are corroborated by comprehensive numerical experiments and two real-data applications.
翻译:迁移学习旨在通过利用相关源域的信息来改善目标域的推断效果,但其有效性关键取决于跨域异质性的建模与控制方式。当连接协变量与响应的条件机制在不同域间变化时,不加区分的信息汇集可能导致负迁移,使得性能甚至劣于仅使用目标域的估计。本文研究条件分布漂移下的多源单目标迁移学习问题,提出一种半参数域变系数模型(DVCM),其中域间关联性通过可观测的域标识符进行编码。该框架将经典变系数模型推广至结构化迁移学习场景,并在不变性与完全异质性机制之间实现连续插值。基于此模型,我们开发了一种自适应迁移学习估计器,能够选择性地从信息丰富的源域借力,同时可证明地防范负迁移风险。该估计器计算高效且易于实现;我们进一步证明其达到极小极大速率最优性,并推导了其渐近分布,从而在数据自适应汇集与收缩的背景下仍能实现有效的不确定性量化与假设检验。我们的结果精确刻画了域异质性、底层均值函数光滑度以及源域数量之间的相互作用关系,并通过全面的数值实验和两个实际数据应用得到验证。