We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting is against a critical problem; We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model. Empirically, we observe that the Hessian-based distance measure can match the scale of the observed generalization gap of fine-tuned models in practice. We also test our algorithm on several image classification tasks with noisy training labels, showing notable gains over prior methods, and the Hessian distance measure of the fine-tuned model decreases substantially.
翻译:本文研究在目标任务上对预训练深度神经网络进行微调。我们通过分析微调的泛化特性来理解过拟合问题——该问题在目标数据集较小或训练标签含有噪声时尤为常见。现有深度网络的泛化度量依赖于微调模型与初始参数(即预训练网络)的距离以及深度网络的噪声稳定性等概念。本文通过PAC-Bayesian分析提出一种基于Hessian的距离度量,实验表明该度量与微调模型观测到的泛化差距具有良好的相关性。理论上,我们证明了基于Hessian距离的微调模型泛化界。针对标签噪声下的微调(其中过拟合是关键问题),我们进行了扩展研究;在类别条件独立噪声模型下,我们提出了一种算法及其泛化误差保证。实验观察到,基于Hessian的距离度量能与实际微调模型的泛化差距规模相匹配。我们在多个含噪声训练标签的图像分类任务上测试了所提算法,较现有方法取得显著提升,且微调模型的Hessian距离度量大幅降低。