We study transfer learning for a linear regression task using several least-squares pretrained models that can be overparameterized. We formulate the target learning task as optimization that minimizes squared errors on the target dataset with penalty on the distance of the learned model from the pretrained models. We analytically formulate the test error of the learned target model and provide the corresponding empirical evaluations. Our results elucidate when using more pretrained models can improve transfer learning. Specifically, if the pretrained models are overparameterized, using sufficiently many of them is important for beneficial transfer learning. However, the learning may be compromised by overparameterization bias of pretrained models, i.e., the minimum $\ell_2$-norm solution's restriction to a small subspace spanned by the training examples in the high-dimensional parameter space. We propose a simple debiasing via multiplicative correction factor that can reduce the overparameterization bias and leverage more pretrained models to learn a target predictor.
翻译:我们研究使用多个可能过参数化的最小二乘预训练模型来完成线性回归任务的迁移学习。我们将目标学习任务表述为在目标数据集上最小化平方误差,同时对学习到的模型与预训练模型之间的距离施加惩罚的优化问题。我们解析地推导了学习到的目标模型的测试误差,并提供了相应的实证评估。我们的结果阐明了在何种情况下使用更多预训练模型可以改善迁移学习。具体而言,如果预训练模型是过参数化的,那么使用足够多的此类模型对于实现有益的迁移学习至关重要。然而,学习过程可能受到预训练模型过参数化偏差的影响,即在高维参数空间中,最小 $\ell_2$ 范数解被限制在由训练样本张成的小子空间内。我们提出了一种简单的通过乘法校正因子进行去偏的方法,该方法可以减少过参数化偏差,并利用更多预训练模型来学习目标预测器。