Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment.
翻译:专有大语言模型(如ChatGPT)因处理多样化任务的卓越能力而备受关注。近期研究表明,开源的小规模基础模型(如70亿参数的LLaMA)在通过指令驱动数据微调后,也能展现出处理多种任务的显著能力。本研究探讨一种实际问题场景——主要关注一个或少数特定任务而非通用指令遵循,并探索在此类目标场景下大语言模型能否发挥优势并进一步优化。我们以包含七项写作任务的写作辅助场景作为测试平台,收集这些任务的训练数据,将其重构为指令遵循格式,随后通过指令微调对LLaMA模型进行优化。实验结果表明,在写作指令数据上微调LLaMA能显著提升其写作任务能力。我们进一步开展实验与分析,为未来在特定场景中有效微调LLaMA提供指导。最后,我们围绕仅针对单一目标任务使用大语言模型的必要性展开讨论,综合考量微调所需工作量与部署时消耗的资源。