Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.
翻译:通过多语言指令微调大型语言模型可使其适应多语言下游任务。然而,使用少量语言是否足够,以及增加语言数量能否带来更多收益,仍有待探究。本研究通过在1至52种语言上对大型多语言模型进行微调,以BLOOM为案例剖析了影响性能的三个关键因素:语言数量、语言暴露程度以及训练与测试语言之间的相似性。总体发现:1)扩展多语言指令微调的语言覆盖范围具有显著效益;2)若测试语言出现在指令混合集中,准确率常获得显著提升;3)语言的谱系特征比单纯的语言数量更能预测跨语言迁移效果,但不同语言的受益程度存在差异。