Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We investigate the influence of model configuration choices and devise a multi-facet evaluation strategy for multilingual instruction following. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in the other languages, but suffer from low factuality and may occasionally have fluency errors.
翻译:指令微调(IT)被广泛用于训练预训练大语言模型(LLM)遵循任意指令,但在多语言场景下的研究尚不充分。本文系统研究了指令微调中的零样本跨语言迁移问题:当LLM仅使用英语数据进行指令微调后,其在其他语言用户提示下的表现。我们探究了模型配置选择的影响,并设计了一种多维度评估策略用于多语言指令遵循任务。研究发现,即使模型训练所有阶段均以英语为中心,跨语言迁移仍能成功发生,但前提是超参数调优需考虑多语言性,且指令微调数据量足够大。经过英语训练的LLM能够以目标语言生成正确、全面且有用的回复,但存在事实性不足的问题,且偶尔会出现流畅性错误。