Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on two representative LLMs, namely LLaMA-2 and BLOOM. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.
翻译:大型语言模型(LLMs)在未经专门构建的多语言平行语料库预训练的情况下,展现出显著的多语言处理能力。解释LLMs处理多语言文本的潜在机制仍是一个具有挑战性的问题。本文深入剖析LLMs中Transformer架构的组成,以定位语言特异性区域。具体而言,我们提出了一种新颖的检测方法——语言激活概率熵(LAPE),用于识别LLMs中的语言特异性神经元。基于LAPE,我们对两个代表性LLMs(即LLaMA-2和BLOOM)进行了全面实验。研究结果表明,LLMs处理特定语言的能力主要归因于一小部分神经元,这些神经元主要位于模型的顶层和底层。此外,我们通过选择性地激活或抑制语言特异性神经元,展示了“引导”LLMs输出语言的可能性。我们的研究为理解与探索LLMs的多语言能力提供了重要证据。