A fundamental challenge in reasoning is navigating hypothetical, counterfactual worlds where logic may conflict with ingrained knowledge. We investigate this frontier for Large Language Models (LLMs) by asking: Can LLMs reason logically when the context contradicts their parametric knowledge? To facilitate a systematic analysis, we first introduce CounterLogic, a benchmark specifically designed to disentangle logical validity from knowledge alignment. Evaluation of 11 LLMs across six diverse reasoning datasets reveals a consistent failure: model accuracy plummets by an average of 14% in counterfactual scenarios compared to knowledge-aligned ones. We hypothesize that this gap stems not from a flaw in logical processing, but from an inability to manage the cognitive conflict between context and knowledge. Inspired by human metacognition, we propose a simple yet powerful intervention: Flag & Reason (FaR), where models are first prompted to flag potential knowledge conflicts before they reason. This metacognitive step is highly effective, narrowing the performance gap to just 7% and increasing overall accuracy by 4%. Our findings diagnose and study a critical limitation in modern LLMs' reasoning and demonstrate how metacognitive awareness can make them more robust and reliable thinkers.
翻译:推理中的一个基本挑战是在假设的、反事实的世界中导航,这些世界中的逻辑可能与根深蒂固的知识相冲突。我们通过提出以下问题来探究大语言模型(LLMs)在这一前沿领域的能力:当上下文与其参数化知识相矛盾时,LLMs能否进行逻辑推理?为了促进系统性分析,我们首先引入了CounterLogic,这是一个专门设计用来区分逻辑有效性与知识一致性的基准测试。对11个LLMs在六个不同推理数据集上的评估揭示了一个一致性的失败:与知识一致的情景相比,模型在反事实情景下的准确率平均下降了14%。我们假设,这种差距并非源于逻辑处理过程的缺陷,而是源于无法管理上下文与知识之间的认知冲突。受人类元认知的启发,我们提出了一种简单而有效的干预措施:标记与推理(FaR),即首先提示模型在推理之前标记出潜在的知识冲突。这种元认知步骤非常有效,将性能差距缩小到仅7%,并将整体准确率提高了4%。我们的发现诊断并研究了现代LLMs推理中的一个关键局限性,并展示了元认知意识如何使它们成为更稳健和可靠的思考者。