This work investigates the resilience of contemporary LLMs against frequent and structured character-level perturbations, specifically through the insertion of noisy characters after each input character. We introduce UCC-Inj, a practical method that inserts invisible Unicode control characters into text to discourage LLM misuse in scenarios such as online exam systems. Surprisingly, despite strong obfuscation that fragments tokenization and reduces the signal-to-noise ratio significantly, many LLMs still maintain notable performance. Through comprehensive evaluation across model-, problem-, and noise-related configurations, we examine the extent and mechanisms of this robustness, exploring both the handling of character-level tokenization and implicit versus explicit denoising mechanism hypotheses of character-level noises. We hope our findings on the low-level robustness of LLMs will shed light on the risks of their misuse and on the reliability of deploying LLMs across diverse applications.
翻译:本研究探讨了当代大型语言模型(LLMs)对频繁且结构化的字符级扰动的鲁棒性,具体通过在每个输入字符后插入噪声字符来实现。我们提出了UCC-Inj方法,该方法通过向文本中插入不可见的Unicode控制字符,旨在防止LLMs在在线考试系统等场景中的滥用。令人惊讶的是,尽管这种强混淆操作会破坏分词结构并显著降低信噪比,许多LLMs仍能保持显著的性能表现。通过对模型相关、问题相关及噪声相关配置的全面评估,我们考察了这种鲁棒性的程度与机制,探究了模型对字符级分词的处理方式,以及针对字符级噪声的隐式与显式去噪机制假说。我们希望关于LLMs底层鲁棒性的研究发现,能够为揭示其滥用风险及在不同应用场景中部署LLMs的可靠性提供启示。