Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
翻译:无监督学习的最新进展表明,无监督预训练结合微调能够提升模型的泛化能力。然而,关于在无标签数据集上学到的表示函数如何影响微调模型泛化能力的严谨理解仍然缺失。现有的理论研究未能充分解释预训练和微调阶段中分布与任务的异质性。为弥补这一不足,本文提出了一种新颖的理论框架,阐明了影响无监督预训练阶段获取的知识向后续微调阶段迁移的关键因素,并最终影响微调模型在下游任务中的泛化能力。我们应用该理论框架分析了两种不同场景的泛化界:深度神经网络下的上下文编码器预训练以及深度Transformer下的掩码自编码器预训练,随后在二分类任务上进行微调。最后,受研究结果的启发,我们提出了一种新的预训练正则化方法,以进一步提升微调模型的泛化性能。总体而言,我们的结果促进了对无监督预训练与微调范式的更深入理解,并可为设计更有效的预训练算法提供启示。