With the release of new large language models (LLMs) like Llama and Mistral, zero-shot cross-lingual transfer has become increasingly feasible due to their multilingual pretraining and strong generalization capabilities. However, adapting these decoder-only LLMs to new tasks across languages remains challenging. While parameter-efficient fine-tuning (PeFT) techniques like Low-Rank Adaptation (LoRA) are widely used, prefix-based techniques such as soft prompt tuning, prefix tuning, and Llama Adapter are less explored, especially for zero-shot transfer in decoder-only models. We present a comprehensive study of three prefix-based methods for zero-shot cross-lingual transfer from English to 35+ high- and low-resource languages. Our analysis further explores transfer across linguistic families and scripts, as well as the impact of scaling model sizes from 1B to 24B. With Llama 3.1 8B, prefix methods outperform LoRA-baselines by up to 6% on the Belebele benchmark. Similar improvements were observed with Mistral v0.3 7B as well. Despite using only 1.23M learning parameters with prefix tuning, we achieve consistent improvements across diverse benchmarks. These findings highlight the potential of prefix-based techniques as an effective and scalable alternative to LoRA, particularly in low-resource multilingual settings.
翻译:随着Llama和Mistral等新型大规模语言模型(LLMs)的发布,得益于其多语言预训练和强大的泛化能力,零样本跨语言迁移已日益可行。然而,将这些仅解码器架构的LLMs适配至跨语言新任务仍具挑战性。虽然参数高效微调(PeFT)技术如低秩适配(LoRA)已被广泛应用,但基于前缀的技术(如软提示调优、前缀调优和Llama Adapter)在仅解码器模型中的零样本迁移方面研究较少。本文系统研究了三种基于前缀的方法在从英语到35种以上高资源与低资源语言的零样本跨语言迁移中的表现。我们的分析进一步探讨了跨语系和文字体系的迁移效果,以及模型规模从1B到24B扩展的影响。实验表明,在Llama 3.1 8B模型上,前缀方法在Belebele基准测试中较LoRA基线最高提升6%。Mistral v0.3 7B模型也观察到类似改进。尽管前缀调优仅使用123万个学习参数,我们在多样化的基准测试中均取得了稳定提升。这些发现凸显了基于前缀的技术作为LoRA的有效可扩展替代方案的潜力,尤其在低资源多语言场景中。