While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities, fine-tuning these models on downstream, domain-specific datasets is often necessary to yield superior performance on test sets compared to their counterparts without fine-tuning. However, the comprehensive effects of fine-tuning on the LLMs' generalization ability are not fully understood. This paper delves into the differences between original, unmodified LLMs and their fine-tuned variants. Our primary investigation centers on whether fine-tuning affects the generalization ability intrinsic to LLMs. To elaborate on this, we conduct extensive experiments across five distinct language tasks on various datasets. Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. Intriguingly, we observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability. Through this systematic investigation, we aim to contribute valuable insights into the evolving landscape of fine-tuning practices for LLMs.
翻译:尽管大型语言模型(LLMs)已展现出卓越的多任务处理能力,但在下游领域特定数据集上进行微调,往往能使模型在测试集上获得比未微调模型更优的性能。然而,微调对LLMs泛化能力的全面影响尚未被完全理解。本文深入探讨了原始未修改的LLMs与其微调变体之间的差异。我们的核心研究聚焦于微调是否会影响LLMs固有的泛化能力。为此,我们在五个不同语言任务的多种数据集上进行了广泛实验。主要发现表明,针对生成任务和分类任务微调的模型在跨领域和跨任务泛化中表现出不同行为。引人注目的是,我们观察到在生成任务的微调过程中融入上下文学习策略可以增强模型的泛化能力。通过这项系统性研究,我们旨在为不断发展的LLMs微调实践提供有价值的见解。