Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation.
翻译:尽管经过多语言预训练,大语言模型在非英语任务中仍常面临挑战,尤其是在语言控制(即用目标语言回复的能力)方面。我们识别并刻画了两种关键失效模式:多语言迁移瓶颈(语言正确但任务响应错误)和语言一致性瓶颈(任务响应正确但语言错误)。为系统性揭示这些问题,我们设计了涵盖MMLU、MGSM和XQuAD基准测试的四场景评估协议。为通过可解释性探究这些问题,我们将logit透镜分析扩展至逐层追踪语言概率,并计算隐藏状态的跨语言语义相似度。结果揭示了三阶段内部结构:早期层将输入对齐到共享语义空间,中间层执行任务推理,晚期层驱动语言特定生成。基于这些发现,我们提出仅对负责语言控制的最终层进行选择性微调。在Qwen-3-32B和Bloom-7.1B上,该方法仅微调3-5%的参数,即可在六种语言上实现超过98%的语言一致性,且不牺牲任务准确率。重要的是,该结果与全参数微调几乎一致(例如,两种方法在所有提示场景下均达到98%以上的语言一致性),但仅消耗极少的计算资源。据我们所知,这是首个利用语言控制的层定位特性实现高效多语言适配的方法。