ChatGPT and GPT-4 have attracted substantial interest from both academic and industrial circles, owing to their remarkable few-shot (or even zero-shot) ability to handle various tasks. Recent work shows that, after being fine-tuned with a few sets of instruction-driven data, the recently proposed LLM, LLaMa, exhibits an impressive capability to address a broad range of tasks. However, the zero-shot performance of LLMs does not consistently outperform that of models fined-tuned for specific scenarios. To explore whether the capabilities of LLMs can be further enhanced for specific scenarios, we choose the writing-assistance scenario as the testbed, including seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine LLaMa via instruction tuning. Experimental results show that continually fine-tuning LLaMa on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMa for specific scenarios.
翻译:ChatGPT和GPT-4因其在处理多种任务时展现出卓越的小样本(甚至零样本)能力,已引起学术界和工业界的广泛关注。近期研究表明,在对最新提出的大语言模型LLaMa进行少量指令驱动数据的微调后,该模型具备了处理广泛任务的出色能力。然而,大语言模型的零样本性能并不始终优于针对特定场景微调的模型。为探究能否进一步提升大语言模型在特定场景下的能力,我们选取写作辅助场景作为测试平台,涵盖七项写作任务。我们收集了这些任务的训练数据,将其重构为指令遵循格式,随后通过指令微调对LLaMa进行优化。实验结果显示,在写作指令数据上持续微调LLaMa显著提升了其在写作任务上的表现。我们还开展了更多实验与分析,为未来有效针对特定场景微调LLaMa的研究提供参考。