When Language Representations Interact: Separability and Cross-Lingual Effects in LLMs

Large language models exhibit strong multilingual capabilities, however, their internal representations are difficult to interpret. Understanding these interactions is important for ensuring reliable behavior in multilingual systems. Recent work has shown that causal-geometric structure can explain how certain concepts are encoded as approximately linear and separable directions, but whether this framework extends to multilingual models, where language identity is correlated and hierarchical, is underexplored. We apply causal-geometric analysis to multilingual LLMs, studying 28 bilingual contrasts across three models, allowing us to analyze when languages behave as approximately independent factors and when structured dependencies persist. We find evidence that language concepts admit stable linear representations that are largely separable under a covariance-adjusted (causal) inner product, with structured deviations reflecting linguistic similarity. Moreover, languages within the same family (such as Germanic or Romance) exhibit a simplex-like geometric structure, suggesting hierarchical organization. These results extend causal-geometric interpretability to multilingual settings and provide insight into how separability and similarity may exist in multilingual LLM representations, motivating interpretability analyses that diagnose when and how structured dependencies between concepts can be anticipated. This has implications for trustworthy deployment, as residual structure between languages may lead to unintended cross-lingual effects when models are monitored or intervened upon.

翻译：大语言模型展现出强大的多语言能力，但其内部表征难以解释。理解这些交互对于确保多语言系统的可靠行为至关重要。近期研究表明，因果几何结构可以解释某些概念如何被编码为近似线性且可分离的方向，但该框架能否拓展至语言身份具有相关性和层次性的多语言模型，尚有待探索。我们将因果几何分析应用于多语言大语言模型，研究三种模型中的28组双语对比，从而分析语言何时表现为近似独立因子，以及何时存在结构化依赖。研究发现，语言概念具有稳定的线性表征，在协方差调整（因果）内积下基本可分离，且结构化偏差反映了语言相似性。此外，同一语系（如日耳曼语系或罗曼语系）的语言呈现出类似单纯形的几何结构，表明存在层次化组织。这些结果将因果几何可解释性拓展至多语言环境，并为理解多语言大语言模型表征中可分离性与相似性的共存机制提供了洞见，从而激励在诊断模型受监控或干预时如何预判概念间结构化依赖的可解释性分析。这对可信部署具有重要意义，因为语言间的残留结构可能导致模型在监控或干预过程中产生非预期的跨语言效应。