Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks

Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that multilingual LLMs may be more vulnerable to various attacks than their monolingual counterparts. While previous work has investigated embedding inversion over a small subset of European languages, it is challenging to extrapolate these findings to languages from different linguistic families and with differing scripts. To this end, we explore the security of multilingual LLMs in the context of embedding inversion attacks and investigate cross-lingual and cross-script inversion across 20 languages, spanning over 8 language families and 12 scripts. Our findings indicate that languages written in Arabic script and Cyrillic script are particularly vulnerable to embedding inversion, as are languages within the Indo-Aryan language family. We further observe that inversion models tend to suffer from language confusion, sometimes greatly reducing the efficacy of an attack. Accordingly, we systematically explore this bottleneck for inversion models, uncovering predictable patterns which could be leveraged by attackers. Ultimately, this study aims to further the field's understanding of the outstanding security vulnerabilities facing multilingual LLMs and raise awareness for the languages most at risk of negative impact from these attacks.

翻译：大型语言模型（LLMs）易受网络攻击者通过对抗性攻击、后门攻击及嵌入反演攻击等入侵方式的恶意影响。为此，新兴的LLM安全领域致力于研究并防御此类威胁。迄今为止，该领域的大多数研究集中于单语英语模型，然而，新兴研究表明多语言LLMs可能比单语模型更容易受到各类攻击。尽管先前研究已针对少数欧洲语言子集探讨了嵌入反演，但将这些发现推广至不同语系及文字体系的语言仍具挑战性。为此，我们在嵌入反演攻击背景下探究多语言LLMs的安全性，并在跨越8个以上语系和12种文字体系的20种语言中开展跨语言与跨文字的反演研究。我们的研究结果表明，使用阿拉伯文字和西里尔文字的语言，以及印度-雅利安语系的语言，对嵌入反演攻击尤为脆弱。我们进一步观察到，反演模型常受语言混淆问题困扰，有时会大幅降低攻击效果。基于此，我们系统性地探究了反演模型的这一瓶颈，揭示了攻击者可能利用的可预测模式。最终，本研究旨在推动学界深入理解多语言LLMs面临的重要安全漏洞，并提高对最易受此类攻击负面影响的语言的风险意识。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日