Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of evaluating various aspects of model responses in multilingual instruction following and investigate the influence of different model configuration choices. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in other languages, but suffer from low factuality and may occasionally have fluency errors.
翻译:指令微调(IT)被广泛用于训练预训练大规模语言模型(LLMs)遵循任意指令,但在多语言场景下的研究尚不充分。本研究系统探究了IT中的零样本跨语言迁移——当LLM仅使用英语数据完成指令微调后,再对其输入其他语言的用户提示进行测试。我们强调在多语言指令遵循中评估模型响应各方面的重要性,并考察了不同模型配置选择的影响。研究发现:即使模型训练所有阶段均以英语为中心,只要超参数调优时考虑多语言因素且IT数据规模足够大,跨语言迁移仍可成功实现。经过英语训练的LLM能够生成其他语言中语言正确、内容丰富且有用的响应,但存在事实准确性较低的问题,偶尔也会出现流畅性错误。