Code over Words: Overcoming Semantic Inertia via Code-Grounded Reasoning

LLMs struggle with Semantic Inertia: the inability to inhibit pre-trained priors (e.g., "Lava is Dangerous") when dynamic, in-context rules contradict them. We probe this phenomenon using Baba Is You, where physical laws are mutable text rules, enabling precise evaluation of models' ability to override learned priors when rules change. We quantatively observe that larger models can exhibit inverse scaling: they perform worse than smaller models when natural language reasoning requires suppressing pre-trained associations (e.g., accepting "Lava is Safe"). Our analysis attributes this to natural language encoding, which entangles descriptive semantics and logical rules, leading to persistent hallucinations of familiar physics despite explicit contradictory rules. Here we show that representing dynamics as executable code, rather than descriptive text, reverses this trend and enables effective prior inhibition. We introduce Code-Grounded Vistas (LCV), which fine-tunes models on counterfactual pairs and identifies states with contradictory rules, thereby forcing attention to logical constraints rather than visual semantics. This training-time approach outperforms expensive inference-time search methods in both efficiency and accuracy. Our results demonstrate that representation fundamentally determines whether scaling improves or impairs contextual reasoning. This challenges the assumption that larger models are universally better, with implications for domains that require dynamic overriding of learned priors.

翻译：大型语言模型面临语义惯性困境：当动态的上下文规则与预训练先验知识（如“岩浆是危险的”）相矛盾时，模型难以抑制这些先验知识。我们通过《Baba Is You》游戏探究这一现象——该游戏中物理法则以可修改的文本规则形式存在，能够精确评估规则变化时模型覆盖已学先验知识的能力。定量研究表明，更大规模的模型可能出现逆向缩放现象：当自然语言推理需要抑制预训练关联（例如接受“岩浆是安全的”）时，其表现反而逊于较小模型。分析表明，自然语言编码将描述性语义与逻辑规则相互纠缠，导致即使存在明确矛盾规则，模型仍持续产生符合常见物理规律的幻觉。本文证明，将动态规则表示为可执行代码而非描述性文本，能够逆转这一趋势并实现有效的先验抑制。我们提出代码锚定视觉推理框架（LCV），通过反事实配对微调模型并识别规则矛盾状态，迫使模型关注逻辑约束而非视觉语义。这种训练阶段的方法在效率和准确率上均优于昂贵的推理阶段搜索方法。我们的研究结果表明，表征形式从根本上决定了模型缩放是否会改善或损害上下文推理能力。这对“更大模型必然更优”的普遍假设提出了挑战，对需要动态覆盖已学先验知识的领域具有重要启示。