This work investigates the resilience of contemporary large language models (LLMs) against frequent character-level perturbations. We examine three types of character-level perturbations including introducing numerous typos within words, shuffling the characters in each word, and inserting a large number of invisible characters into the text. Surprisingly, even under severe perturbation, such as shuffling nearly all words character-wise to produce text that is almost unreadable to humans, or inserting invisible characters which are several times more than the visible ones as noise, many LLMs still maintain notable performance. We explore the underlying causes of this robustness and find that LLMs exhibit remarkable resilience to chaotic segmentation and fragmented tokenization. Furthermore, we examine the mechanisms by which LLMs remove perturbations to correctly comprehend text, including both implicit and explicit mechanisms for character-level perturbation. We hope that our findings on the low-level robustness of LLMs will unveil their inherent architectural strengths, reveal the potential risks of their misuse, and inform the reliable deployment of LLMs across diverse application scenarios.
翻译:本研究探讨了当代大语言模型(LLMs)对常见字符级扰动的鲁棒性。我们考察了三种类型的字符级扰动,包括在单词内引入大量拼写错误、打乱每个单词的字符顺序以及在文本中插入大量不可见字符。令人惊讶的是,即使在严重扰动下(例如将几乎所有单词的字符顺序打乱,生成对人类几乎不可读的文本;或插入数倍于可见字符的不可见字符作为噪声),许多LLMs仍能保持显著的性能。我们探究了这种鲁棒性的根本原因,发现LLMs对混乱的分词和碎片化的token化表现出显著的适应能力。此外,我们研究了LLMs消除扰动以正确理解文本的机制,包括针对字符级扰动的隐式和显式机制。我们希望关于LLMs底层鲁棒性的研究结果能够揭示其固有的架构优势,暴露其被滥用的潜在风险,并为LLMs在不同应用场景中的可靠部署提供参考。