As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on machine unlearning has primarily focused on monolingual settings, typically English, multilingual environments introduce additional complexities due to cross-lingual knowledge transfer and biases embedded in both pretraining and fine-tuning data. In this work, we study multilingual unlearning using the Aya-Expanse 8B model under two settings: (1) data unlearning and (2) concept unlearning. We extend benchmarks for factual knowledge and stereotypes to ten languages through translation: English, French, Arabic, Japanese, Russian, Farsi, Korean, Hindi, Hebrew, and Indonesian. These languages span five language families and a wide range of resource levels. Our experiments show that unlearning in high-resource languages is generally more stable, with asymmetric transfer effects observed between typologically related languages. Furthermore, our analysis of linguistic distances indicates that syntactic similarity is the strongest predictor of cross-lingual unlearning behavior.
翻译:随着多语言大语言模型日益广泛应用,确保其在多样语言环境中的安全性与公平性带来了独特挑战。现有关于机器遗忘学习的研究主要集中于单语环境(通常为英语),而多语言环境由于跨语言知识迁移以及预训练与微调数据中存在的偏见,引入了额外的复杂性。本研究使用 Aya-Expanse 8B 模型,在两种设定下探讨多语言遗忘学习:(1)数据遗忘与(2)概念遗忘。我们通过翻译将事实知识与刻板印象的基准测试扩展至十种语言:英语、法语、阿拉伯语、日语、俄语、波斯语、韩语、印地语、希伯来语和印度尼西亚语。这些语言涵盖五个语系及广泛的资源水平。实验表明,高资源语言的遗忘学习通常更稳定,且在类型学相关语言间观察到非对称的迁移效应。此外,我们对语言距离的分析表明,句法相似性是预测跨语言遗忘行为的最强指标。