To improve the performance on a target task, researchers have fine-tuned language models with an intermediate task before the target task of interest. However, previous works have focused on the pre-trained language models and downstream tasks in Natural Language Processing (NLP) and considered only one intermediate task. The effect of fine-tuning multiple intermediate tasks and their ordering on target task performance has not been fully explored in Software Engineering. In this study, we perform the first empirical study on analyzing the impact of task ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss. To explain such an impact, we consider a variety of potential factors, including the characteristics of dataset (syntactic similarity and semantic similarity analysis, dataset size), model (probing task and attention analysis), and task (task affinity analysis). Our study provides Software Engineering researchers and practitioners with insights into the effect of task orderings and how to select the one that is cost-effective while achieving the best performance gain.
翻译:为提升目标任务的性能,研究者们常在目标任务前使用中间任务对语言模型进行微调。然而,先前研究主要集中于自然语言处理领域的预训练语言模型与下游任务,且仅考虑单一中间任务。在软件工程领域,多个中间任务的微调及其顺序对目标任务性能的影响尚未得到充分探索。本研究首次通过实证分析探讨了任务顺序对目标任务性能的影响。实验结果表明,任务顺序对目标任务性能存在显著影响,最高可带来6%的性能增益,也可能导致最高4%的性能损失。为解释这一影响,我们考察了多种潜在因素,包括数据集特征(句法相似性与语义相似性分析、数据集规模)、模型特性(探测任务与注意力分析)以及任务属性(任务亲和度分析)。本研究为软件工程领域的研究者与实践者提供了关于任务顺序影响的深入见解,并指导如何选择在实现最佳性能增益的同时兼顾成本效益的任务顺序方案。