The Role of Fine-tuning: Transfer Learning for High-dimensional M-estimators with Decomposable Regularizers

Transfer learning algorithms have been developed in various applicational contexts while only a few of them offer statistical guarantees in high-dimensions. Among these work, the differences between the target and sources, a.k.a. the contrasts, are typically modeled as, or at least close to, vectors with certain low-dimensional structure (e.g., sparsity), resulting in a separate debiasing step after a preceding pooling estimation procedure. Under such intuitive yet powerful framework, additional homogeneity conditions on Hessian matrices of the population loss functions are often imposed to preserve the delicate low-dimensional structure of the contrasts during pooling, which is either unrealistic in practice or easily destroyed by basic data transformation such as standardization. In this article, under the general M-estimators framework with decomposable regularizers, we highlight the role of fine-tuning underneath the conspicuous gain of the debiasing step in transfer learning. Namely, we find it is possible to enhance estimation accuracy by fine-tuning a primal estimator sufficiently close to the true target one. Our theory suggests slightly enlarging the pooling regularization strength when either the contrast's low-dimensional structure or the homogeneity of Hessian matrices is violated. Traditional linear regression and generalized low-rank trace regression in high-dimensions are discussed as two specific examples under our framework. When the informative source datasets are unknown, a novel truncated-penalized algorithm is proposed to directly output the primal estimator by simultaneously selecting the useful sources and its oracle property is proved. Extensive numerical experiments are conducted to validate the theoretical assertions. A case study on the air quality regulation in China by transfer learning is also provided for illustration.

翻译：迁移学习算法已在各种应用场景中得到发展，但其中仅有少数能在高维情境下提供统计保证。在这些研究中，目标与源数据之间的差异（即对比项）通常被建模为具有某种低维结构（例如稀疏性）的向量（或至少接近于此），从而在事先进行联合估计后需单独进行去偏步骤。在这种直观而强大的框架下，为了在联合估计过程中保持对比项精细的低维结构，通常需要对总体损失函数的Hessian矩阵施加额外的同质性条件，然而这些条件在实践中往往不切实际，或容易被标准化等基本数据转换破坏。本文在具有可分解正则项的一般M估计器框架下，揭示了微调在迁移学习中去偏步骤显著增益中的核心作用。具体而言，我们发现通过将初始估计器微调至足够接近真实目标参数，可以提升估计精度。我们的理论表明，当对比项的低维结构或Hessian矩阵的同质性条件被违反时，应略微增强联合估计的正则化强度。我们以经典高维线性回归和广义低秩迹回归作为框架下的两个具体示例进行讨论。当信息源数据集未知时，我们提出了一种新颖的截断-惩罚算法，通过同时选择有用源直接输出初始估计器，并证明了其Oracle性质。通过大量数值实验验证了理论断言，并以中国空气质量调控迁移学习案例研究进行说明。