The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4.
翻译:非英语数据的稀缺性限制了非英语大语言模型的发展。将英语中心的大语言模型迁移至非英语语言已被证明是一种高效且资源节约的方法。先前的研究通常从基础大语言模型出发,利用更强模型(如GPT-4)生成的数据进行知识蒸馏。相较于基础模型,对话式大语言模型针对多轮对话和人类偏好对齐等高级能力进行了进一步优化,因此在帮助性和安全性方面表现更为出色。然而,迁移对话式大语言模型面临两个关键问题:(1)如何在缺乏监督数据的情况下有效迁移高级能力?(2)如何在迁移过程中防止原始知识发生灾难性遗忘?为解决这些问题,我们提出了一个名为TransLLM的简单框架。针对第一个问题,TransLLM通过翻译思维链将迁移问题分解为若干常见子任务,以翻译作为英语与非英语之间的逐步桥梁。我们进一步利用公开数据增强子任务的性能。针对第二个问题,我们提出了一种包含两个协同组件的方法:采用低秩适配训练以保持原始大语言模型参数,以及恢复性知识蒸馏——利用对话式大语言模型自身生成的数据从冻结参数中恢复原始知识。实验中,我们将LLaMA-2-chat-7B模型迁移至泰语。仅使用单轮数据,我们的方法在多轮基准测试MT-bench上超越了强基线模型和ChatGPT。此外,在未使用安全数据的情况下,我们的方法在安全基准测试AdvBench上对有害查询的拒绝率高于ChatGPT和GPT-4。