Analysis of Task Transferability in Large Pre-trained Classifiers

Transfer learning transfers the knowledge acquired by a model from a source task to multiple downstream target tasks with minimal fine-tuning. The success of transfer learning at improving performance, especially with the use of large pre-trained models has made transfer learning an essential tool in the machine learning toolbox. However, the conditions under which the performance is transferable to downstream tasks are not understood very well. In this work, we analyze the transfer of performance for classification tasks, when only the last linear layer of the source model is fine-tuned on the target task. We propose a novel Task Transfer Analysis approach that transforms the source distribution (and classifier) by changing the class prior distribution, label, and feature spaces to produce a new source distribution (and classifier) and allows us to relate the loss of the downstream task (i.e., transferability) to that of the source task. Concretely, our bound explains transferability in terms of the Wasserstein distance between the transformed source and downstream task's distribution, conditional entropy between the label distributions of the two tasks, and weighted loss of the source classifier on the source task. Moreover, we propose an optimization problem for learning the transforms of the source task to minimize the upper bound on transferability. We perform a large-scale empirical study by using state-of-the-art pre-trained models and demonstrate the effectiveness of our bound and optimization at predicting transferability. The results of our experiments demonstrate how factors such as task relatedness, pretraining method, and model architecture affect transferability.

翻译：迁移学习通过最小化微调，将从源任务中获取的模型知识迁移到多个下游目标任务。随着大型预训练模型的广泛应用，迁移学习在提升性能方面取得了显著成功，已成为机器学习工具箱中的关键技术。然而，性能向目标任务迁移的条件仍未被充分理解。本文针对仅微调源模型最后一个线性层的情况，分析了分类任务中的性能迁移特性。我们提出了一种新颖的任务迁移分析方法，该方法通过改变类别先验分布、标签空间和特征空间来变换源分布（及分类器），从而生成新的源分布（及分类器），并建立下游任务损失（即迁移性）与源任务损失之间的关联。具体而言，我们的理论界限通过变换后的源分布与下游任务分布之间的Wasserstein距离、两个任务标签分布间的条件熵，以及源分类器在源任务上的加权损失来解释迁移性。此外，我们提出了一个优化问题，通过学习源任务的变换来最小化迁移性的上界。我们利用当前最先进的预训练模型进行了大规模实证研究，验证了所提界限与优化方法在预测迁移性方面的有效性。实验结果揭示了任务相关性、预训练方法及模型架构等因素对迁移性的影响机制。