Semantic Deception: When Reasoning Models Can't Compute an Addition

Large language models (LLMs) are increasingly used in situations where human values are at stake, such as decision-making tasks that involve reasoning when performed by humans. We investigate the so-called reasoning capabilities of LLMs over novel symbolic representations by introducing an experimental framework that tests their ability to process and manipulate unfamiliar symbols. We introduce semantic deceptions: situations in which symbols carry misleading semantic associations due to their form, such as being embedded in specific contexts, designed to probe whether LLMs can maintain symbolic abstraction or whether they default to exploiting learned semantic associations. We redefine standard digits and mathematical operators using novel symbols, and task LLMs with solving simple calculations expressed in this altered notation. The objective is: (1) to assess LLMs' capacity for abstraction and manipulation of arbitrary symbol systems; (2) to evaluate their ability to resist misleading semantic cues that conflict with the task's symbolic logic. Through experiments with four LLMs we show that semantic cues can significantly deteriorate reasoning models' performance on very simple tasks. They reveal limitations in current LLMs' ability for symbolic manipulations and highlight a tendency to over-rely on surface-level semantics, suggesting that chain-of-thoughts may amplify reliance on statistical correlations. Even in situations where LLMs seem to correctly follow instructions, semantic cues still impact basic capabilities. These limitations raise ethical and societal concerns, undermining the widespread and pernicious tendency to attribute reasoning abilities to LLMs and suggesting how LLMs might fail, in particular in decision-making contexts where robust symbolic reasoning is essential and should not be compromised by residual semantic associations inherited from the model's training.

翻译：大型语言模型（LLM）越来越多地应用于涉及人类价值判断的场景，例如在人类执行时需要推理的决策任务。我们通过引入一个实验框架来研究LLM对新型符号表征的所谓推理能力，该框架测试其处理和操作陌生符号的能力。我们提出了语义欺骗：即符号因其形式而携带误导性语义关联的情境（例如被嵌入特定语境中），旨在探究LLM是否能保持符号抽象能力，抑或默认依赖已习得的语义关联。我们使用全新符号重新定义标准数字和数学运算符，并要求LLM解决用这种变更符号表示的基本计算问题。研究目标在于：（1）评估LLM对任意符号系统的抽象和操作能力；（2）检验其抵抗与任务符号逻辑相冲突的误导性语义线索的能力。通过对四种LLM的实验表明，语义线索会显著降低推理模型在极简单任务上的表现。这些结果揭示了当前LLM在符号操作能力上的局限性，突显其过度依赖表层语义的倾向，暗示思维链可能放大了对统计相关性的依赖。即使在LLM看似正确遵循指令的情况下，语义线索仍会影响其基础能力。这些局限性引发了伦理和社会关切，削弱了将推理能力泛化归因于LLM的普遍而有害的倾向，并揭示了LLM可能失效的具体场景——尤其是在需要稳健符号推理且不应受模型训练遗留语义关联影响的决策情境中。