Creating multilingual LLMs poses a significant challenge. Pretraining or fine-tuning LLMs to adopt new languages is evidently very costly. Furthermore, there exist limitations concerning benchmark datasets and the metrics used to measure model performance in multilingual settings. This paper proposes cost-effective solutions to both aforementioned challenges. Firstly, we introduce the Multilingual Instruction-Tuning Dataset (MITS), comprised of Alpaca-52K, Dolly-15K, and Vicuna Benchmark translations into 132 languages. Secondly, we propose a new method called \emph{TaCo: Translation-Assisted Cross-Linguality}, which utilizes translations in a chain-of-thought process to instruction-tune LLMs on new languages through a curriculum-learning process. As a proof of concept, we experimented with the instruction-tuned Guanaco-33B model, performing further instruction tuning using our proposed TaCo method in three low-resource languages and one high-resource language. Our results indicate that the TaCo method impresses GPT-4 with an 82\% score for a low-resource language in the Vicuna Benchmark dataset, doubling the performance in contrast to instruction tuning alone. Furthermore, TaCo shows promise in creating multilingual LLMs, even for low-resource languages. We have released our datasets and model adapters\footnote{https://github.com/UNHSAILLab/TaCo} , encouraging the research community to utilize these resources to advance work on multilingual LLMs.
翻译:创建多语言大语言模型(LLMs)是一项重大挑战。对LLMs进行预训练或微调以学习新语言显然成本高昂。此外,在多语言场景中,用于衡量模型性能的基准数据集和评价指标也存在局限性。本文针对上述两个挑战提出了经济有效的解决方案。首先,我们引入了多语言指令调优数据集(MITS),该数据集包含将Alpaca-52K、Dolly-15K和Vicuna基准翻译成132种语言后的内容。其次,我们提出了一种名为“TaCo:翻译辅助跨语言性”的新方法,该方法利用翻译在思维链过程中的作用,通过课程学习流程对LLMs进行新语言的指令调优。作为概念验证,我们以经过指令调优的Guanaco-33B模型为实验对象,使用我们提出的TaCo方法对三种低资源语言和一种高资源语言进行了进一步的指令调优。结果表明,在Vicuna基准数据集中,TaCo方法使GPT-4对一种低资源语言的评分达到82%,性能相比仅进行指令调优提高了一倍。此外,TaCo在创建多语言LLMs(包括低资源语言)方面展现出潜力。我们已发布数据集和模型适配器,鼓励研究社区利用这些资源推进多语言LLMs的相关工作。