Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR.
翻译:因果推理是人类理解世界的基石。为建模和推理因果关系,因果图提供了一种简洁而有效的解决方案。鉴于语言模型取得的显著进展,一个关键问题随之产生:它们能否真正理解因果图?为此,我们率先对语言模型理解因果图的能力展开研究。具体而言,我们构建了一个定义因果图理解的框架,通过从多学科(如哲学与心理学)提炼的四个实践标准来评估语言模型的行为。随后,我们开发了CLEAR这一新颖基准,该基准定义了三个复杂度层级,并涵盖跨越这些层级的20项基于因果图的任务。最后,基于我们的框架与基准,我们对六个领先的语言模型进行了广泛实验,并归纳出五项实证发现。研究结果表明,尽管语言模型展现出对因果图的初步理解能力,但仍存在显著的改进空间。项目网站位于 https://github.com/OpenCausaLab/CLEAR。