Large language models (LLMs) have achieved remarkable performance on a wide range of tasks. However, recent studies have shown that LLMs can memorize training data and simple repeated tokens can trick the model to leak the data. In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage. The intuition is that, since LLMs are trained with massive data that contains a substantial amount of special characters (e.g. structural symbols {, } of JSON files, and @, # in emails and online posts), the model may memorize the co-occurrence between these special characters and the raw texts. This motivates us to propose a simple but effective Special Characters Attack (SCA) to induce training data leakage. Our experiments verify the high effectiveness of SCA against state-of-the-art LLMs: they can leak diverse training data, such as code corpus, web pages, and personally identifiable information, and sometimes generate non-stop outputs as a byproduct. We further show that the composition of the training data corpus can be revealed by inspecting the leaked data -- one crucial piece of information for pre-training high-performance LLMs. Our work can help understand the sensitivity of LLMs to special characters and identify potential areas for improvement.
翻译:大语言模型(LLMs)在各类任务中展现出卓越性能。然而,近期研究表明,LLMs会记忆训练数据,且简单的重复词元可诱使模型泄露数据。本文进一步揭示,特定特殊字符或其与英语字母的组合是更强的记忆触发因子,会导致更严重的数据泄露。其直觉在于:由于LLMs训练数据中包含大量特殊字符(例如JSON文件的结构符号{、},以及电子邮件和在线帖子中的@、#),模型可能记忆这些特殊字符与原始文本的共现关系。这促使我们提出一种简单但高效的特殊字符攻击(SCA)方法,以诱发训练数据泄露。实验验证了SCA对当前最先进LLMs的高度有效性:它们可泄露代码语料库、网页和个人身份信息等多种训练数据,有时还会产生非终止输出作为副产品。我们进一步表明,通过检查泄露数据可揭示训练数据语料库的构成——这是预训练高性能LLMs的关键信息。本研究有助于理解LLMs对特殊字符的敏感性,并识别潜在的改进方向。