Multilingualism in Large Language Models (LLMs) is an yet under-explored area. In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing. Our findings reveal different patterns of multilinugal processing in the sublayers of Feed-Forward Networks of the models. Furthermore, we uncover the phenomenon of "over-layerization" in certain model configurations, where increasing layer depth without corresponding adjustments to other parameters may degrade model performance. Through comparisons within and across languages, we demonstrate the interplay between model architecture, layer depth, and multilingual processing capabilities of LLMs trained on multiple languages.
翻译:大语言模型的多语言能力仍是一个尚未充分探索的领域。本文对一个大型语言模型家族的多语言能力进行了深入分析,研究了其跨语言的架构、激活模式和处理机制。我们引入新的度量指标,用于探测模型不同层次的多语言行为,并揭示架构选择对多语言处理的影响。研究结果表明,在前馈网络子层中,多语言处理呈现不同模式。此外,我们发现了某些模型配置中的"过度分层"现象,即层数增加而未相应调整其他参数时,可能导致模型性能下降。通过语言内部及跨语言的比较,我们展示了模型架构、层深度与训练于多种语言的大语言模型的多语言处理能力之间的相互作用。