The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of various cross-lingual vocabulary adaptation methods on five generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that cross-lingual vocabulary adaptation substantially contributes to LLM inference speedups of up to 271.5%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.
翻译:最先进的生成式大语言模型(LLMs)的开发过度依赖以英语为中心的词元化器、词汇表和预训练数据。尽管部分大语言模型具备多语言能力,但近期研究表明,这些模型在生成非英语文本时推理效率会显著下降,导致推理时间和成本增加。为提升下游任务性能,已有研究提出跨语言词汇自适应方法以将模型适配至目标语言,然而这些方法对提高生成式大语言模型推理效率的有效性尚未得到充分探索。本文针对四种类型学差异显著的语言及四项自然语言理解任务,对五种生成式大语言模型(包括单语模型和多语模型)开展了跨语言词汇自适应方法的实证研究。我们发现跨语言词汇自适应方法能显著提升大语言模型推理速度,最高可达271.5%。研究还表明,使用更均衡的多语言数据进行预训练的大语言模型经自适应后,其下游任务性能可与原始模型相当。