We present the first comprehensive evaluation of cross-lingual unlearning in multilingual LLMs. Using translated TOFU benchmarks in seven language/script variants, we test major unlearning algorithms and show that most fail to remove facts outside the training language, even when utility remains high. However, subspace-projection consistently outperforms the other methods, achieving strong cross-lingual forgetting with minimal degradation. Analysis of learned task subspaces reveals a shared interlingua structure: removing this shared subspace harms all languages, while removing language-specific components selectively affects one. These results demonstrate that multilingual forgetting depends on geometry in weight space, motivating subspace-based approaches for future unlearning systems.
翻译:我们首次对多语言大语言模型中的跨语言遗忘进行了全面评估。通过使用七种语言/文字变体的翻译版TOFU基准,我们测试了主流遗忘算法,结果表明大多数算法无法移除训练语言之外的事实信息,即使模型实用性仍保持较高水平。然而,子空间投影方法始终优于其他方法,能以最小性能损失实现强效的跨语言遗忘。对已学习任务子空间的分析揭示了共享的中间语言结构:移除该共享子空间会损害所有语言性能,而移除语言特定组件则仅选择性影响单一语言。这些结果证明多语言遗忘依赖于权重空间的几何结构,为未来基于子空间的遗忘系统设计提供了理论依据。