Multilingual large language models (LLMs) have gained significant popularity for their ability to process and generate text across multiple languages. However, deploying these models in production can be inefficient when only a subset of the supported languages is of interest. There has been some research conducted on identifying whether machine translation models have language-specific or language-agnostic heads, however no research has been conducted for multilingual LLMs, to the best of our knowledge, that as we know are capable of performing diverse tasks beyond just translation. This paper explores whether multilingual LLMs have specialized language attention heads for each language, and investigates the possibility of removing language-specific heads for unwanted languages without degrading performance in the targeted languages. Our findings could inform more efficient deployment strategies for multilingual LLMs, enabling reduced model complexity while maintaining high accuracy for targeted languages.
翻译:多语言大语言模型(LLMs)因其能够处理并生成多种语言的文本而广受欢迎。然而,当实际部署中仅需关注其支持语言的一个子集时,使用这些模型可能效率低下。已有研究探讨了机器翻译模型是否具有语言特定或语言无关的注意力头,但据我们所知,尚未有研究针对多语言LLMs展开——这类模型已知能够执行翻译之外的多种任务。本文探究多语言LLMs是否对每种语言都具有专门的语言注意力头,并研究在移除针对非目标语言的语言特定注意力头后,能否在不降低目标语言性能的前提下实现模型简化。我们的发现可为多语言LLMs提供更高效的部署策略,在保持目标语言高准确性的同时降低模型复杂度。