In the transfer learning paradigm models learn useful representations (or features) during a data-rich pretraining stage, and then use the pretrained representation to improve model performance on data-scarce downstream tasks. In this work, we explore transfer learning with the goal of optimizing downstream performance. We introduce a simple linear model that takes as input an arbitrary pretrained feature transform. We derive exact asymptotics of the downstream risk and its fine-grained bias-variance decomposition. Our finding suggests that using the ground-truth featurization can result in "double-divergence" of the asymptotic risk, indicating that it is not necessarily optimal for downstream performance. We then identify the optimal pretrained representation by minimizing the asymptotic downstream risk averaged over an ensemble of downstream tasks. Our analysis reveals the relative importance of learning the task-relevant features and structures in the data covariates and characterizes how each contributes to controlling the downstream risk from a bias-variance perspective. Moreover, we uncover a phase transition phenomenon where the optimal pretrained representation transitions from hard to soft selection of relevant features and discuss its connection to principal component regression.
翻译:在迁移学习范式中,模型在数据丰富的预训练阶段学习有效表示(或特征),随后利用预训练表示提升数据稀缺下游任务的模型性能。本研究以优化下游性能为目标探索迁移学习。我们引入一个简单的线性模型,该模型以任意预训练特征变换作为输入,推导出下游风险的精确渐近特性及其细粒度偏差-方差分解。研究发现,使用真实特征化会导致渐近风险的"双重发散",表明其未必能优化下游性能。随后,我们通过最小化下游任务集上的平均渐近风险,确定了最优预训练表示。分析揭示了学习任务相关特征与数据协变量结构的重要性,并从偏差-方差角度刻画了二者如何分别控制下游风险。此外,我们发现了相变现象:最优预训练表示从相关特征的硬选择过渡到软选择,并探讨了其与主成分回归的联系。