Parameter Efficient Finetuning (PEFT) has emerged as a viable solution for improving the performance of Large Language Models (LLMs) without requiring massive resources and compute. Prior work on multilingual evaluation has shown that there is a large gap between the performance of LLMs on English and other languages. Further, there is also a large gap between the performance of smaller open-source models and larger LLMs. Finetuning can be an effective way to bridge this gap and make language models more equitable. In this work, we finetune the LLama-2-7B and Mistral-7B models on two synthetic multilingual instruction tuning datasets to determine its effect on model performance on six downstream tasks covering forty languages in all. Additionally, we experiment with various parameters, such as rank for low-rank adaptation and values of quantisation to determine their effects on downstream performance and find that higher rank and higher quantisation values benefit low-resource languages. We find that PEFT of smaller open-source models sometimes bridges the gap between the performance of these models and the larger ones, however, English performance can take a hit. We also find that finetuning sometimes improves performance on low-resource languages, while degrading performance on high-resource languages.
翻译:参数高效微调(PEFT)已成为在不消耗大量资源和算力的情况下提升大型语言模型(LLM)性能的可行方案。此前关于多语言评估的研究表明,LLM在英语与其他语言之间的性能存在显著差距。此外,较小的开源模型与大型LLM之间也存在巨大鸿沟。微调可作为弥合这一差距并促进语言模型公平性的有效手段。本研究在LLama-2-7B与Mistral-7B模型上使用两个合成多语言指令微调数据集进行微调,以确定其对模型在涵盖四十种语言的六项下游任务中的性能影响。我们同时实验了低秩适应中的秩参数与量化值等多种参数,探究其对下游性能的影响,发现更高的秩与量化值有利于低资源语言。研究表明,对较小开源模型进行PEFT有时能缩小其与大型模型间的性能差距,但英语性能可能因此受损。我们还发现,微调有时能提升低资源语言性能,但对高资源语言性能产生负面效应。