In the realm of out-of-distribution generalization tasks, finetuning has risen as a key strategy. While the most focus has been on optimizing learning algorithms, our research highlights the influence of pre-trained model selection in finetuning on out-of-distribution performance and inference uncertainty. Balancing model size constraints of a single GPU, we examined the impact of varying pre-trained datasets and model parameters on performance metrics like accuracy and expected calibration error. Our findings underscore the significant influence of pre-trained model selection, showing marked performance improvements over algorithm choice. Larger models outperformed others, though the balance between memorization and true generalization merits further investigation. Ultimately, our research emphasizes the importance of pre-trained model selection for enhancing out-of-distribution generalization.
翻译:在分布外泛化任务中,微调已成为一种关键策略。尽管大多数研究聚焦于优化学习算法,但我们的研究揭示了微调过程中预训练模型的选择对分布外性能及推理不确定性的影响。在单GPU模型尺寸约束下,我们考察了不同预训练数据集和模型参数对准确率、期望校准误差等性能指标的影响。研究结果表明,预训练模型的选择具有显著影响——其带来的性能提升远超算法选择的改进。大型模型表现优于其他模型,但记忆与真正泛化之间的平衡仍需进一步探究。最终,本研究强调了预训练模型选择对于提升分布外泛化性能的重要性。