Transfer learning is widely used to adapt large pretrained models to new tasks with only a small amount of new data. However, a challenge persists -- the features from the original task often do not fully cover what is needed for unseen data, especially when the relatedness of tasks is not clear. Since deep learning models tend to learn very sparse representations, they retain only the minimal features required for the initial training while discarding potentially ones for downstream transfer. A theoretical framework developed in this work demonstrates that such pretraining captures inconsistent aspects of the data distribution, therefore, inducing transfer bias. To address this limitation, we propose an inexpensive ensembling strategy that aggregates multiple models to generate richer feature representations. On ResNet, this approach yields a $9\%$ improvement in transfer accuracy without incurring extra pretraining cost. We also present empirical evidence from a range of deep learning studies, confirming that the phenomenon is pervasive across modern deep learning architectures. These results suggests that relying solely on large pretrained networks is not always the most effective way to improve model generalization. Instead, fostering richer, more diverse representations -- e.g. - through model ensembles -- can substantially enhance transfer learning performance.
翻译:迁移学习被广泛用于将大型预训练模型适配到仅需少量新数据的新任务中。然而,一个挑战始终存在——原始任务的特征通常无法完全覆盖未见数据所需的信息,尤其在任务间相关性不明确时。由于深度学习模型倾向于学习极其稀疏的表示,它们仅保留初始训练所需的最小特征集,同时丢弃可能对下游迁移有用的特征。本研究构建的理论框架表明,此类预训练捕捉的是数据分布中不一致的方面,从而引发迁移偏差。为突破这一局限,我们提出一种低成本集成策略,通过聚合多个模型生成更丰富的特征表示。在ResNet上,该方法在不增加预训练成本的情况下实现了$9\%$的迁移准确率提升。我们还通过一系列深度学习研究提供实证证据,证实该现象在现代深度学习架构中普遍存在。这些结果表明,仅依赖大型预训练网络并非提升模型泛化能力的最有效途径。相反,培育更丰富、更多样化的表示——例如通过模型集成——能显著增强迁移学习性能。