The success of ChatGPT validates the potential of large language models (LLMs) in artificial general intelligence (AGI). Subsequently, the release of LLMs has sparked the open-source community's interest in instruction-tuning, which is deemed to accelerate ChatGPT's replication process. However, research on instruction-tuning LLMs in Chinese, the world's most spoken language, is still in its early stages. Therefore, this paper makes an in-depth empirical study of instruction-tuning LLMs in Chinese, which can serve as a cookbook that provides valuable findings for effectively customizing LLMs that can better respond to Chinese instructions. Specifically, we systematically explore the impact of LLM bases, parameter-efficient methods, instruction data types, which are the three most important elements for instruction-tuning. Besides, we also conduct experiment to study the impact of other factors, e.g., chain-of-thought data and human-value alignment. We hope that this empirical study can make a modest contribution to the open Chinese version of ChatGPT. This paper will release a powerful Chinese LLMs that is comparable to ChatGLM. The code and data are available at https://github.com/PhoebusSi/Alpaca-CoT.
翻译:ChatGPT的成功验证了大语言模型在人工通用智能领域的潜力。随后,大语言模型的发布激发了开源社区对指令微调的兴趣,这被认为能加速ChatGPT的复制进程。然而,作为全球使用最广泛的语言,中文大语言模型指令微调的研究仍处于早期阶段。因此,本文对中文大语言模型指令微调进行了深入的实证研究,可作为一本"烹饪手册",为有效定制能更好响应中文指令的大语言模型提供有价值的发现。具体而言,我们系统探究了大语言模型基座、参数高效方法、指令数据类型这三个指令微调最关键要素的影响。此外,我们还实验研究了其他因素(如思维链数据和人类价值观对齐)的影响。我们期望这项实证研究能为开源中文版ChatGPT做出微薄贡献。本文将发布一个与ChatGLM性能相当的高性能中文大语言模型。代码和数据已在 https://github.com/PhoebusSi/Alpaca-CoT 公开。