Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English. This is unsurprising given that their training data largely consists of English text and instructions. A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training. This approach not only incurs high cost, but also results in poorly translated data due to the non-standard formatting of chain-of-thought and mathematical reasoning instructions. In this paper, we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English question data. In this way we perform targetted, in-domain language alignment which makes best use of English instruction data to unlock the LLMs' multilingual reasoning abilities. Experimental results on LLaMA2-13B show that question alignment leads to consistent improvements over the translate-training approach: an average improvement of 11.3\% and 16.1\% accuracy across ten languages on the MGSM and MSVAMP maths reasoning benchmarks (The project will be available at: https://github.com/NJUNLP/QAlign).
翻译:大语言模型在推理任务上展现出令人瞩目的性能,但在非英语语言上表现往往差得多。这并不意外,因为其训练数据主要由英语文本和指令构成。典型解决方案是将指令数据翻译成所有相关语言,然后在生成的多语言数据上进行训练,即翻译训练法。该方法不仅成本高昂,还会因思维链和数学推理指令的非标准格式而产生低质量翻译数据。本文探索了问题对齐的优势——通过使用X语言到英语的问题数据进行微调,训练模型将推理问题翻译成英语。通过这种方式,我们实现了有针对性、领域内的语言对齐,从而最佳利用英语指令数据释放大语言模型的多语言推理能力。基于LLaMA2-13B的实验结果表明,问题对齐法在翻译训练法基础上取得了一致提升:在MGSM和MSVAMP数学推理基准测试的十种语言上,准确率平均提升11.3%和16.1%(项目代码见:https://github.com/NJUNLP/QAlign)。