Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to finetune} but neglects the issue of \textit{where to finetune}. As a pioneering work on answering where to finetune (at the layer level), we conduct a semantic analysis of the LM inference process. We first propose a virtual transition of the latent representation and then trace its factual transition. Based on the deviation in transitions, we estimate the gain of finetuning each model layer, and further, narrow down the scope for finetuning. We perform extensive experiments across well-known LMs and datasets. The results show that our approach is effective and efficient, and outperforms the existing baselines. Our approach is orthogonal to existing efficient techniques, such as PEFT methods, offering practical values on LM finetuning.
翻译:微调语言模型对于使模型适应下游数据和任务至关重要。然而,完全微调通常成本高昂。现有工作,如参数高效微调,通常关注“如何微调”,而忽视了“在何处微调”的问题。作为在层级别上回答“在何处微调”这一问题的开创性工作,我们对语言模型的推理过程进行了语义分析。我们首先提出了潜在表示的虚拟转移,然后追踪其实际转移。基于转移过程中的偏差,我们估计了微调每个模型层的收益,并进一步缩小了微调的范围。我们在多个知名语言模型和数据集上进行了广泛的实验。结果表明,我们的方法既有效又高效,并且优于现有的基线方法。我们的方法与现有高效技术(如PEFT方法)正交,为语言模型微调提供了实用价值。