Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain largely unexplored. In this paper, we identify emoticon semantic confusion, a vulnerability where LLMs misinterpret ASCII-based emoticons to perform unintended and even destructive actions. To systematically study this phenomenon, we develop an automated data generation pipeline and construct a dataset containing 3,757 code-oriented test cases spanning 21 meta-scenarios, four programming languages, and varying contextual complexities. Our study on six LLMs reveals that emoticon semantic confusion is pervasive, with an average confusion ratio exceeding 38%. More critically, over 90% of confused responses yield 'silent failures', which are syntactically valid outputs but deviate from user intent, potentially leading to destructive security consequences. Furthermore, we observe that this vulnerability readily transfers to popular agent frameworks, while existing prompt-based mitigations remain largely ineffective. We call on the community to recognize this emerging vulnerability and develop effective mitigation methods to uphold the safety and reliability of the LLM system.
翻译:表情符号在数字通信中被广泛用于传达情感意图,然而其对大型语言模型(LLMs)的安全性影响在很大程度上仍未得到探索。本文中,我们识别出表情符号语义混淆这一漏洞,即LLMs错误解释基于ASCII的表情符号,从而执行非预期甚至破坏性操作。为系统研究这一现象,我们开发了自动化数据生成流程,构建了一个包含3,757个代码导向测试用例的数据集,涵盖21个元场景、四种编程语言及不同上下文复杂度。对六个LLMs的研究表明,表情符号语义混淆普遍存在,平均混淆率超过38%。更关键的是,超过90%的混淆响应会产生“静默失败”——这些输出在语法上有效但偏离用户意图,可能导致破坏性的安全后果。此外,我们观察到该漏洞易迁移至流行的智能体框架,而现有的基于提示的缓解措施基本无效。我们呼吁学界关注这一新兴漏洞,并开发有效的缓解方法以维护LLM系统的安全性与可靠性。