Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks, significantly improving task-specific performance. Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers. However, LLMs trained with SFT sometimes make simple mistakes and result in hallucinations on reasoning tasks such as question-answering. Without external feedback, it is difficult for SFT to learn a good mapping between the question and the desired answer, especially with a small dataset. This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi). LaFFi has LLMs directly predict the feedback they will receive from an annotator. We find that requiring such reflection can significantly improve the accuracy in in-domain question-answering tasks, providing a promising direction for the application of natural language feedback in the realm of SFT LLMs. Additional ablation studies show that the portion of human-annotated data in the annotated datasets affects the fine-tuning performance.
翻译:微调大型语言模型(LLMs)可将已训练模型适配到特定下游任务,显著提升任务性能。监督微调(SFT)是一种常见方法,通过训练LLMs生成期望答案。然而,采用SFT训练的LLMs有时会犯简单错误,并在问答等推理任务中出现幻觉现象。缺乏外部反馈时,SFT难以有效学习问题与期望答案之间的映射关系,尤其在数据量较小时更为明显。本文提出一种SFT的替代方案——基于自然语言反馈的LLMs微调方法(LaFFi)。LaFFi使LLMs直接预测其将从标注者处获得的反馈。研究发现,要求模型进行此类反思可显著提升领域内问答任务的准确性,为自然语言反馈在SFT LLMs领域的应用提供了有前景的方向。附加消融实验表明,标注数据集中人工标注数据的比例会影响微调性能。