Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner, RIW-TL includes samples via importance weighting and thus may permit more effective sample use. To determine the weights, remarkably RIW-TL only requires the knowledge of one-dimensional densities dependent on residuals, thus overcoming the curse of dimensionality of having to estimate high-dimensional densities in naive importance weighting. We show that the oracle RIW-TL provides a faster rate than its competitors and develop a cross-fitting procedure to estimate this oracle. We discuss variants of RIW-TL by adopting different choices for residual weighting. The theoretical properties of RIW-TL and its variants are established and compared with those of LASSO and Trans-Lasso. Extensive simulation and a real data analysis confirm its advantages.
翻译:迁移学习是一种利用多源数据改进单一目标统计推断的新兴范式。本文针对基于惩罚似然的高维线性模型,提出了一种名为残差重要性加权迁移学习(RIW-TL)的新方法。与现有方法(如以全有或全无方式选择数据源的Trans-Lasso)相比,RIW-TL通过重要性加权纳入样本,从而可能实现更高效的样本利用。值得注意的是,RIW-TL仅需依赖残差的一维密度知识即可确定权重,从而克服了朴素重要性加权中需估计高维密度的维度灾难。我们证明,理想化RIW-TL收敛速度优于同类方法,并开发了一种交叉拟合流程以逼近该理想化估计。通过采用不同残差加权策略,我们讨论了RIW-TL的变体。本文建立了RIW-TL及其变体的理论性质,并与LASSO及Trans-Lasso进行了比较。大量模拟实验与真实数据分析验证了该方法的优越性。