Large Language Models (LLMs) often struggle with deductive judgment in syllogistic reasoning, systematically conflating semantic plausibility with formal validity a phenomenon known as content effect. This bias persists even when models generate step-wise explanations, indicating that intermediate rationales may inherit the same semantic shortcuts that affect answers. Recent approaches propose mitigating this issue by increasing inference-time structural constraints, either by encouraging abstract intermediate representations or by intervening directly in the model's internal computations; however, reliably suppressing semantic interference remains an open challenge. To make formal deduction less sensitive to semantic content, we introduce a framework for abstraction-guided reasoning that explicitly separates structural inference from lexical semantics. We construct paired content-laden and abstract syllogisms and use the model's activations on abstract inputs to define an abstract reasoning space. We then learn lightweight Abstractors that, from content-conditioned residual-stream states, predict representations aligned with this space and integrate these predictions via multi-layer interventions during the forward pass. Using cross-lingual transfer as a test bed, we show that abstraction-aligned steering reduces content-driven errors and improves validity-sensitive performance. Our results position activation-level abstraction as a scalable mechanism for enhancing the robustness of formal reasoning in LLMs against semantic interference.
翻译:大语言模型在演绎推理中常难以进行有效判断,系统性地混淆语义合理性与形式有效性,这一现象被称为内容效应。即使模型生成逐步解释,这种偏差依然存在,表明中间推理过程可能继承了影响答案的语义捷径。近期研究提出通过增强推理时的结构约束来缓解该问题,具体方法包括鼓励抽象中间表示或直接干预模型内部计算;然而,可靠抑制语义干扰仍是待解决的挑战。为使形式推理对语义内容更不敏感,我们提出一种抽象引导推理框架,明确将结构推理与词汇语义分离。通过构建成对的内容负载与抽象三段论,利用模型在抽象输入上的激活定义抽象推理空间。随后训练轻量级抽象器,使其能够从内容条件化的残差流状态预测与该空间对齐的表示,并通过前向传播过程中的多层干预机制整合这些预测。以跨语言迁移为测试平台,我们证明抽象对齐的引导能减少内容驱动错误并提升有效性敏感性能。本研究将激活层级的抽象定位为可扩展机制,可增强大语言模型形式推理对抗语义干扰的鲁棒性。