This paper explores an intriguing observation: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans, particularly in reasoning tasks. We conduct an in-depth investigation to understand why this occurs. Contrary to the common belief that these instances is due to the more detailed nature of LLM-generated content, our study identifies another contributing factor: an LLM is inherently more "familiar" with LLM generated responses. This familiarity is evidenced by lower perplexity before fine-tuning. We design a series of experiments to understand the impact of the "familiarity" and our conclusion reveals that this "familiarity" significantly impacts learning performance. Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other reasoning tasks after fine-tuning on a specific task.
翻译:本文探讨了一个有趣的现象:使用大型语言模型(LLM)生成的响应进行微调,往往比使用人类生成的响应获得更好的结果,尤其在推理任务中。我们进行了深入研究以理解其成因。与普遍认为这些实例源于LLM生成内容更详细的观点相反,我们的研究发现另一个关键因素:LLM本质上对LLM生成的响应更为“熟悉”。这种熟悉度在微调前表现为更低的困惑度。我们设计了一系列实验来探究“熟悉度”的影响,结论表明该因素显著影响学习性能。使用LLM生成的响应进行训练不仅能提升性能,还有助于模型在特定任务微调后保持其他推理任务的能力。