约束下的推理效应：失真而非幻觉 (Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints)

With the widespread adoption of large language models (LLMs), hallucinations, which are non-factual fabrications in model outputs, have become serious concerns. Reasoning capabilities have received attention as a self-verification process to improve output reliability. However, the effect of reasoning within a closed system where LLMs cannot rely on external tools or knowledge has yet to be clarified. We therefore conduct experiments under strict constraints (recommending peer-reviewed journal articles in computer science) to examine the effect of reasoning across multiple models (GPT-5.2 and Gemini 3 Flash). Our results reveal a problematic trade-off between constraint compliance and factual accuracy. Non-reasoning models exhibit high constraint violation rates (66-75%) but maintain factual accuracy, while reasoning models reduce violations (13-26%) but systematically distort known facts to satisfy constraints and increase complete fabrication. This trade-off pattern is consistent across both models despite different architectures, indicating a fundamental limitation of reasoning. Furthermore, reasoning does not uniformly improve output authenticity: effects diverge by model, reflecting different allocations of the compliance-truthfulness trade-off. These findings challenge the assumption that reasoning universally improves reliability: reasoning models trade honest constraint violations for detection-resistant distortions.

翻译：随着大语言模型的广泛应用，模型输出中非事实性虚构的"幻觉"现象已成为严重问题。推理能力作为一种提升输出可靠性的自我验证机制受到关注。然而，在无法依赖外部工具或知识的封闭系统中，推理的效果尚未明确。为此，我们在严格约束条件下（推荐计算机科学领域的同行评审期刊论文）开展实验，考察多种模型（GPT-5.2与Gemini 3 Flash）的推理效果。研究结果揭示了约束遵循与事实准确性之间存在问题性权衡：非推理模型虽表现出较高的约束违反率（66-75%），但能保持事实准确性；而推理模型虽降低违反率（13-26%），却会系统性地扭曲已知事实以满足约束条件，并增加完全虚构内容。这种权衡模式在不同架构的模型中表现一致，表明推理存在根本性局限。此外，推理并不能普遍提升输出真实性：其效果因模型而异，反映出在遵循约束与保持真实之间不同的权衡分配。这些发现挑战了"推理能普遍提升可靠性"的假设：推理模型实际上是以隐蔽的失真为代价，换取可检测的约束违反行为的减少。