The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation (CVA) methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of five CVA methods on four generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that CVA substantially contributes to LLM inference speedups of up to 271.5\%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.
翻译:当前最先进的生成式大语言模型(LLMs)的发展在很大程度上依赖于以英语为中心的分词器、词汇表和预训练数据。尽管某些LLMs具备多语言能力,但近期研究表明,在生成非英语文本时,其推理效率会显著下降,导致推理时间和成本增加。已有研究提出跨语言词汇适配方法,旨在通过使模型适应目标语言来提升下游任务性能。然而,这些方法在提高生成式LLMs推理效率方面的有效性尚未得到充分探索。本文对四种生成式LLMs(包括单语言和多语言模型)在四种类型学特征各异的语言及四项自然语言理解任务上,对五种CVA方法进行了实证研究。我们发现CVA能显著提升LLM推理速度,最高可达271.5%。同时研究表明,对基于更均衡多语言数据预训练的LLMs进行适配后,其下游任务性能可与原始模型相媲美。