How does a cause lead to an effect, and which intermediate causal steps explain their connection? This work scrutinizes the mechanistic causal reasoning capabilities of large language models (LLMs) to answer these questions through the task of implicit causal chain discovery. In a diagnostic evaluation framework, we instruct nine LLMs to generate all possible intermediate causal steps linking given cause-effect pairs in causal chain structures. These pairs are drawn from recent resources in argumentation studies featuring polarized discussion on climate change. Our analysis reveals that LLMs vary in the number and granularity of causal steps they produce. Although they are generally self-consistent and confident about the intermediate causal connections in the generated chains, their judgments are mainly driven by associative pattern matching rather than genuine causal reasoning. Nonetheless, human evaluations confirmed the logical coherence and integrity of the generated chains. Our baseline causal chain discovery approach, insights from our diagnostic evaluation, and benchmark dataset with causal chains lay a solid foundation for advancing future work in implicit, mechanistic causal reasoning in argumentation settings.
翻译:一个原因如何导致一个结果,哪些中间因果步骤解释了它们之间的联系?本研究通过隐式因果链发现任务,深入审视大语言模型(LLMs)的机制性因果推理能力。在一个诊断性评估框架中,我们指导九个LLM生成连接给定因果对的所有可能中间因果步骤,这些因果对选自近期论证研究中关于气候变化极化讨论的资源。我们的分析表明,不同LLM生成的因果步骤在数量和粒度上存在差异。尽管它们通常对生成链中的中间因果连接具有自洽性和信心,但其判断主要受关联模式匹配驱动,而非真正的因果推理。然而,人工评估确认了生成链的逻辑连贯性和完整性。我们提出的基线因果链发现方法、诊断性评估的见解以及包含因果链的基准数据集,为推进论证场景中隐式机制性因果推理的未来研究奠定了坚实基础。