Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors

Large language models (LLMs) achieve high accuracy on many reasoning benchmarks but remain brittle under structural perturbations of rule-based systems. We introduce a diagnostic framework with four stress tests -- redundant vs. essential rule deletion, contradictory-rule injection, logic-preserving rewrites, and multi-law stacking -- and use it to expose Logic Inertia: the tendency of generative LLMs (Qwen2/3, TinyLlama, GPT-4o, Gemma-3-4B-IT) and the encoder-only BERT baseline to persist along learned deductive trajectories under inconsistent premises. The collapse is sharp: untreated baselines fall from accuracy 1.00 on the base task to 0.00 on contradiction injection (instance-level exact match), and GPT-4o resolves only 56.0% of contradiction cases. We propose Conflict-Aware Fusion, a four-stage training pipeline that enforces verification-before-deduction as a learned structural prior: (i) SFT establishes the verification preamble; (ii) DPO sharpens the halt-on-contradiction decision boundary; (iii) Logical Invariance REgularisation (LIRE) penalises divergence between logically equivalent rule formulations via symmetric KL; (iv) Reinforcement Learning from Verification Feedback (RLVF) uses a symbolic forward-chaining engine as a deterministic oracle reward, jointly optimising invariance and sensitivity. The pipeline saturates all four primary stress tests for both 1.5B and 8B backbones. We further validate a Phase 2 extension that replaces the propositional oracle with a Lean 4 kernel, attaining 99.0% kernel agreement on the 105 classically-derivable (T) questions within a stratified 187-question Lean-translated sample (overall 71.7% across both polarities), providing a sound upgrade path to formally verified RL training. Code and benchmark: https://github.com/14H034160212/lemo

翻译：大语言模型（LLMs）在诸多推理基准上取得了高准确率，但在基于规则的系统中面对结构扰动时仍然脆弱。我们引入了一个包含四项压力测试的诊断框架——冗余规则删除与必需规则删除、矛盾规则注入、逻辑保持性重写以及多定律叠加——并利用该框架揭示了"逻辑惯性"现象：生成式大语言模型（Qwen2/3、TinyLlama、GPT-4o、Gemma-3-4B-IT）以及仅编码器结构的BERT基线，在不一致前提条件下，倾向于沿习得的演绎轨迹持续推理。性能坍塌现象显著：未经处理的基线在基础任务上的准确率从1.00骤降至矛盾注入任务中的0.00（实例级精确匹配），而GPT-4o仅能解决56.0%的矛盾案例。我们提出冲突感知融合方法——一种四阶段训练流程，将"验证前置演绎"作为习得的结构化先验加以强化：（i）监督微调建立验证前缀；（ii）直接偏好优化明确矛盾时中止决策边界；（iii）逻辑不变性正则化通过对称KL散度惩罚逻辑等价规则表述之间的差异；（iv）基于验证反馈的强化学习利用符号前向链接引擎作为确定性预言机奖励，联合优化不变性与敏感性。该流程在1.5B和8B两种骨干网络上均饱和通过全部四项主要压力测试。我们进一步验证了第二阶段扩展方案：将命题逻辑预言机替换为Lean 4内核，在包含105个经典可推导（T）问题的分层187题Lean翻译样本上达到99.0%的内核一致性（双极性总体准确率71.7%），为基于形式化验证的强化学习训练提供了可靠的升级路径。代码与基准测试平台：https://github.com/14H034160212/lemo