Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work, we show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, however, leaving the basin results in losing the benefits of transfer learning and in degradation of the ensemble quality. Based on the analysis of existing exploration methods, we propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger ensembles and uniform model soups.
翻译:迁移学习和集成是提升神经网络性能与鲁棒性的两种常用技术。由于预训练成本高昂,实践中常使用单个预训练检查点微调得到的模型集成。这些模型最终位于损失景观的同一盆地(称为预训练盆地),因此多样性有限。本研究表明,通过更好地探索预训练盆地可改进单个预训练检查点训练的集成,然而离开该盆地会导致丧失迁移学习优势并降低集成质量。基于对现有探索方法的分析,我们针对迁移学习场景提出了一种更有效的Snapshot集成(SSE)改进方案——StarSSE,该方法可生成更强的集成和均匀模型汤。