We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon -- to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of transfer learning is affected by the dataset size in the source and target tasks, the number of transferred layers that are kept frozen in the target DNN training, and the similarity between the source and target tasks. We show that the test error evolution during the target DNN training has a more significant double descent effect when the target training dataset is sufficiently large. In addition, a larger source training dataset can yield a slower target DNN training. Moreover, we demonstrate that the number of frozen layers can determine whether the transfer learning is effectively underparameterized or overparameterized and, in turn, this may induce a freezing-wise double descent phenomenon that determines the relative success or failure of learning. Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task. We establish our results using image classification experiments with the ResNet, DenseNet and the vision transformer (ViT) architectures.
翻译:我们研究了深度神经网络(DNNs)迁移学习的泛化行为。采用过度参数化视角——涵盖训练数据插值(即近似零训练误差)和双重下降现象——来解释迁移学习设置对泛化性能的微妙影响。我们探究了源任务和目标任务的数据集大小、目标任务DNN训练中保持冻结的迁移层数量,以及源任务与目标任务之间的相似性如何影响迁移学习的泛化行为。研究表明,当目标任务训练数据集足够大时,目标任务DNN训练过程中的测试误差演化呈现出更显著的双重下降效应。此外,更大的源训练数据集可能导致更慢的目标任务DNN训练。同时,我们证明冻结层数可决定迁移学习实质上是欠参数化还是过度参数化,进而可能引发一种冻结性双重下降现象,该现象决定了学习的相对成功或失败。我们还发现,双重下降现象可能使得来自相关性较弱源任务的迁移效果优于来自相关性较强源任务的迁移。我们使用ResNet、DenseNet和视觉Transformer(ViT)架构的图像分类实验验证了上述结论。