Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that multilingual LLMs may be more vulnerable to various attacks than their monolingual counterparts. While previous work has investigated embedding inversion over a small subset of European languages, it is challenging to extrapolate these findings to languages from different linguistic families and with differing scripts. To this end, we explore the security of multilingual LLMs in the context of embedding inversion attacks and investigate cross-lingual and cross-script inversion across 20 languages, spanning over 8 language families and 12 scripts. Our findings indicate that languages written in Arabic script and Cyrillic script are particularly vulnerable to embedding inversion, as are languages within the Indo-Aryan language family. We further observe that inversion models tend to suffer from language confusion, sometimes greatly reducing the efficacy of an attack. Accordingly, we systematically explore this bottleneck for inversion models, uncovering predictable patterns which could be leveraged by attackers. Ultimately, this study aims to further the field's understanding of the outstanding security vulnerabilities facing multilingual LLMs and raise awareness for the languages most at risk of negative impact from these attacks.
翻译:大型语言模型(LLMs)易受网络攻击者通过对抗性攻击、后门攻击及嵌入反演攻击等入侵方式的恶意影响。为此,新兴的LLM安全领域致力于研究并防御此类威胁。迄今为止,该领域的大多数研究集中于单语英语模型,然而,新兴研究表明多语言LLMs可能比单语模型更容易受到各类攻击。尽管先前研究已针对少数欧洲语言子集探讨了嵌入反演,但将这些发现推广至不同语系及文字体系的语言仍具挑战性。为此,我们在嵌入反演攻击背景下探究多语言LLMs的安全性,并在跨越8个以上语系和12种文字体系的20种语言中开展跨语言与跨文字的反演研究。我们的研究结果表明,使用阿拉伯文字和西里尔文字的语言,以及印度-雅利安语系的语言,对嵌入反演攻击尤为脆弱。我们进一步观察到,反演模型常受语言混淆问题困扰,有时会大幅降低攻击效果。基于此,我们系统性地探究了反演模型的这一瓶颈,揭示了攻击者可能利用的可预测模式。最终,本研究旨在推动学界深入理解多语言LLMs面临的重要安全漏洞,并提高对最易受此类攻击负面影响的语言的风险意识。