Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models

The advancement of large language models has significantly improved natural language processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to follow instructions contrary to its intended use), hallucinations (generating incorrect or misleading information), and comprehension errors remain prevalent. In this report, we present a comparative analysis of the performance of fifteen distinct models, with each model undergoing a standardized test comprising 38 queries across three key metrics: jailbreaks, hallucinations, and comprehension errors. The models are assessed based on the total occurrences of jailbreaks, hallucinations, and comprehension errors. Our work exposes these models' inherent vulnerabilities and challenges the notion of human-level language comprehension of these models. We have empirically analysed the impact of non-standard Unicode characters on LLMs and their safeguarding mechanisms on the best-performing LLMs, including GPT-4, Gemini 1.5 Pro, LlaMA-3-70B, and Claude 3 Opus. By incorporating alphanumeric symbols from Unicode outside the standard Latin block and variants of characters in other languages, we observed a reduction in the efficacy of guardrails implemented through Reinforcement Learning Human Feedback (RLHF). Consequently, these models exhibit heightened vulnerability to content policy breaches and prompt leakage. Our study also suggests a need to incorporate non-standard Unicode text in LLM training data to enhance the capabilities of these models.

翻译：大型语言模型的进展显著提升了自然语言处理能力。然而，越狱（使LLM违背其预设用途执行指令的提示注入）、幻觉（生成错误或误导性信息）与理解错误等挑战依然普遍存在。本报告对十五个不同模型进行了性能对比分析，每个模型均接受包含38个查询的标准化测试，涵盖越狱、幻觉和理解错误三个关键指标。模型评估基于越狱、幻觉和理解错误的总发生次数。我们的研究揭示了这些模型固有的脆弱性，并对它们达到人类水平语言理解能力的观点提出质疑。我们通过实证分析了非标准Unicode字符对LLM及其安全防护机制的影响，测试对象包括性能最优的GPT-4、Gemini 1.5 Pro、LlaMA-3-70B和Claude 3 Opus等模型。通过引入标准拉丁字符块之外的Unicode字母数字符号及其他语言的字符变体，我们观察到通过人类反馈强化学习（RLHF）实施的安全护栏效能降低。因此，这些模型对内容政策违规和提示泄漏表现出更高的脆弱性。我们的研究还表明，有必要在LLM训练数据中纳入非标准Unicode文本以增强模型能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日