Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that extends beyond general English datasets, including domain-specific and non-English datasets. We also provide an evaluation framework to ensure fair comparisons between ChatGPT and previous approaches. Finally, our analysis outlines the limitations and future challenges in employing ChatGPT for causal text mining. Specifically, our analysis reveals that ChatGPT serves as a good starting point for various datasets. However, when equipped with a sufficient amount of training data, previous models still surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency to falsely recognize non-causal sequences as causal sequences. These issues become even more pronounced with advanced versions of the model, such as GPT-4. In addition, we highlight the constraints of ChatGPT in handling complex causality types, including both intra/inter-sentential and implicit causality. The model also faces challenges with effectively leveraging in-context learning and domain adaptation. We release our code to support further research and development in this field.
翻译:因果关系是人类认知的基础,并已引起不同研究领域的关注。随着文本数据量的增长,识别文本数据中的因果关系至关重要,而因果文本挖掘在提取有意义的模式中扮演关键角色。本研究对ChatGPT的因果文本挖掘能力进行了全面评估。首先,我们引入了一个超越通用英语数据集的基准测试,包含领域特定和非英语数据集。我们还提供了一个评估框架,以确保ChatGPT与先前方法之间的公平比较。最后,我们的分析指出了将ChatGPT用于因果文本挖掘的局限性和未来挑战。具体而言,我们的分析表明,ChatGPT可作为各种数据集的良好起点。然而,当配备足够训练数据时,先前模型的性能仍优于ChatGPT。此外,ChatGPT存在将非因果序列错误识别为因果序列的倾向。这类问题在模型的高级版本(如GPT-4)中更为突出。同时,我们强调了ChatGPT在处理复杂因果关系类型(包括句子内/间因果关系和隐性因果关系)方面的局限性。该模型在有效利用上下文学习和领域适应方面也面临挑战。我们公开代码以支持该领域的进一步研究与发展。