As large language models (LLMs) are increasing integrated into fact-checking pipelines, formal logic is often proposed as a rigorous means by which to mitigate bias, errors and hallucinations in these models' outputs. For example, some neurosymbolic systems verify claims by using LLMs to translate natural language into logical formulae and then checking whether the proposed claims are logically sound, i.e. whether they can be validly derived from premises that are verified to be true. We argue that such approaches structurally fail to detect misleading claims due to systematic divergences between conclusions that are logically sound and inferences that humans typically make and accept. Drawing on studies in cognitive science and pragmatics, we present a typology of cases in which logically sound conclusions systematically elicit human inferences that are unsupported by the underlying premises. Consequently, we advocate for a complementary approach: leveraging the human-like reasoning tendencies of LLMs as a feature rather than a bug, and using these models to validate the outputs of formal components in neurosymbolic systems against potentially misleading conclusions.
翻译:摘要:随着大语言模型(LLMs)越来越多地融入事实核查流程,形式逻辑常被视为一种严谨手段,用以缓解这些模型输出中的偏见、错误和幻觉。例如,某些神经符号系统通过LLMs将自然语言转化为逻辑公式,并检验所提出的主张是否逻辑正确(即能否从已验证为真实的前提中有效推导出来),从而验证相关主张。本文认为,这类方法在结构上无法检测具有误导性的主张,原因在于逻辑正确的结论与人类通常做出并接受的推断之间存在系统性分歧。基于认知科学和语用学的研究,我们提出了一个分类体系,系统梳理了逻辑正确的结论会引发人类做出未被前提支持的推断的多种情形。因此,我们倡导一种互补性方法:将LLMs类人推理倾向视为特性而非缺陷,并利用这些模型验证神经符号系统中形式组件输出的结论是否可能具有误导性。