Causal reasoning ability is crucial for numerous NLP applications. Despite the impressive emerging ability of ChatGPT in various NLP tasks, it is unclear how well ChatGPT performs in causal reasoning. In this paper, we conduct the first comprehensive evaluation of the ChatGPT's causal reasoning capabilities. Experiments show that ChatGPT is not a good causal reasoner, but a good causal interpreter. Besides, ChatGPT has a serious hallucination on causal reasoning, possibly due to the reporting biases between causal and non-causal relationships in natural language, as well as ChatGPT's upgrading processes, such as RLHF. The In-Context Learning (ICL) and Chain-of-Though (COT) techniques can further exacerbate such causal hallucination. Additionally, the causal reasoning ability of ChatGPT is sensitive to the words used to express the causal concept in prompts, and close-ended prompts perform better than open-ended prompts. For events in sentences, ChatGPT excels at capturing explicit causality rather than implicit causality, and performs better in sentences with lower event density and smaller lexical distance between events.
翻译:因果推理能力对于众多自然语言处理应用至关重要。尽管ChatGPT在各种NLP任务中展现出令人印象深刻的涌现能力,但其在因果推理方面的性能尚不明确。本文首次对ChatGPT的因果推理能力进行了综合评估。实验表明,ChatGPT并非优秀的因果推理者,而是一个良好的因果解释者。此外,ChatGPT在因果推理中存在严重幻觉,这可能源于自然语言中因果与非因果关系的报告偏差,以及RLHF等ChatGPT升级过程。上下文学习(ICL)和思维链(COT)技术可能进一步加剧这种因果幻觉。同时,ChatGPT的因果推理能力对提示中表达因果概念的词语敏感,封闭式提示的表现优于开放式提示。对于句子中的事件,ChatGPT擅长捕捉显式因果关系而非隐式因果关系,并在事件密度较低、事件间词汇距离较小的句子中表现更佳。