Many jobs rely on news to learn about causal events in the past and present, to make informed decisions and predictions about the future. With the ever-increasing amount of news and text available on the internet, there is a need to automate the extraction of causal events from unstructured texts. In this work, we propose a methodology to construct causal knowledge graphs (KGs) from news using two steps: (1) Extraction of Causal Relations, and (2) Argument Clustering and Representation into KG. We aim to build graphs that emphasize on recall, precision and interpretability. For extraction, although many earlier works already construct causal KGs from text, most adopt rudimentary pattern-based methods. We close this gap by using the latest BERT-based extraction models alongside pattern-based ones. As a result, we achieved a high recall, while still maintaining a high precision. For clustering, we utilized a topic modelling approach to cluster our arguments, so as to increase the connectivity of our graph. As a result, instead of 15,686 disconnected subgraphs, we were able to obtain 1 connected graph that enables users to infer more causal relationships from. Our final KG effectively captures and conveys causal relationships, validated through multiple use cases and user feedback.
翻译:许多工作依赖新闻了解过去和现在的因果事件,以便对未来做出明智的决策和预测。随着互联网上新闻和文本数量的持续增长,亟需从非结构化文本中自动提取因果事件。在本工作中,我们提出一种从新闻中构建因果知识图谱的方法,包含两个步骤:(1)因果关系的提取,(2)论元聚类与知识图谱表示。我们的目标是构建强调召回率、精确率和可解释性的图谱。在提取方面,尽管早期已有工作从文本中构建因果知识图谱,但多数采用基于模式的初级方法。我们通过结合最新的基于BERT的提取模型与基于模式的方法来弥补这一差距,从而在保持高精确率的同时实现高召回率。在聚类方面,我们采用主题建模方法对论元进行聚类,以增强图谱的连通性。由此,我们成功将15,686个不连通的子图合并为1个连通图,使用户能够从中推断更多因果关系。最终的知识图谱有效捕捉并传递了因果关系,并通过多个应用案例和用户反馈得到了验证。