We propose a transfer learning method that utilizes data representations in a semiparametric regression model. Our aim is to perform statistical inference on the parameter of primary interest in the target model while accounting for potential nonlinear effects of confounding variables. We leverage knowledge from source domains, assuming that the sample size of the source data is substantially larger than that of the target data. This knowledge transfer is carried out by the sharing of data representations, predicated on the idea that there exists a set of latent representations transferable from the source to the target domain. We address model heterogeneity between the source and target domains by incorporating domain-specific parameters in their respective models. We establish sufficient conditions for the identifiability of the models and demonstrate that the estimator for the primary parameter in the target model is both consistent and asymptotically normal. These results lay the theoretical groundwork for making statistical inferences about the main effects. Our simulation studies highlight the benefits of our method, and we further illustrate its practical applications using real-world data.
翻译:本文提出了一种在半参数回归模型中利用数据表示的迁移学习方法。我们的目标是在考虑混杂变量潜在非线性效应的同时,对目标模型中的主要兴趣参数进行统计推断。我们利用来自源域的知识,假设源数据的样本量显著大于目标数据。这种知识迁移通过共享数据表示来实现,其前提是存在一组可从源域迁移到目标域的潜在表示。我们通过在源域和目标域模型中分别引入域特定参数来处理域间模型异质性问题。我们建立了模型可识别性的充分条件,并证明了目标模型中主要参数的估计量具有一致性和渐近正态性。这些结果为关于主效应的统计推断奠定了理论基础。模拟研究展示了我们方法的优势,我们进一步通过实际数据说明了其实际应用价值。