In this paper, we study how pretraining label granularity affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting where the pretraining label is more fine-grained than that of the target problem. We experiment with this method using the label hierarchy of iNaturalist 2021, and observe a 8.76% relative improvement of the error rate over the baseline. We find the following conditions are key for the improvement: 1) the pretraining dataset has a strong and meaningful label hierarchy, 2) its label function strongly aligns with that of the target task, and most importantly, 3) an appropriate level of pretraining label granularity is chosen. The importance of pretraining label granularity is further corroborated by our transfer learning experiments on ImageNet. Most notably, we show that pretraining at the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining at other coarser granularity levels, which supports the common practice. Theoretically, through an analysis on a two-layer convolutional ReLU network, we prove that: 1) models trained on coarse-grained labels only respond strongly to the common or "easy-to-learn" features; 2) with the dataset satisfying the right conditions, fine-grained pretraining encourages the model to also learn rarer or "harder-to-learn" features well, thus improving the model's generalization.
翻译:本文研究了预训练标签粒度如何影响深度神经网络在图像分类任务中的泛化能力。我们重点关注“从细到粗”的迁移学习场景,其中预训练标签比目标问题的标签更为精细。利用iNaturalist 2021的标签层级结构进行实验,我们观察到错误率相对于基线降低了8.76%。研究发现以下条件是实现改进的关键:1) 预训练数据集具有强相关且有意义的标签层级结构;2) 其标签函数与目标任务高度对齐;最重要的是,3) 选择适当的预训练标签粒度。在ImageNet上的迁移学习实验进一步证实了预训练标签粒度的重要性。值得注意的是,我们发现在ImageNet21k的叶节点标签上进行预训练,其在ImageNet1k上的迁移效果优于其他较粗粒度级别的预训练,这支持了现有实践。理论层面,通过分析两层卷积ReLU网络,我们证明了:1) 仅在粗粒度标签上训练的模型仅对常见或“易学”特征产生强响应;2) 在满足适当条件的数据集下,细粒度预训练鼓励模型同时良好学习稀有或“难学”特征,从而提升模型的泛化性能。