Many financial jobs rely on news to learn about causal events in the past and present, to make informed decisions and predictions about the future. With the ever-increasing amount of news available online, there is a need to automate the extraction of causal events from unstructured texts. In this work, we propose a methodology to construct causal knowledge graphs (KGs) from news using two steps: (1) Extraction of Causal Relations, and (2) Argument Clustering and Representation into KG. We aim to build graphs that emphasize on recall, precision and interpretability. For extraction, although many earlier works already construct causal KGs from text, most adopt rudimentary pattern-based methods. We close this gap by using the latest BERT-based extraction models alongside pattern-based ones. As a result, we achieved a high recall, while still maintaining a high precision. For clustering, we utilized a topic modelling approach to cluster our arguments, so as to increase the connectivity of our graph. As a result, instead of 15,686 disconnected subgraphs, we were able to obtain 1 connected graph that enables users to infer more causal relationships from. Our final KG effectively captures and conveys causal relationships, validated through experiments, multiple use cases and user feedback.
翻译:许多金融工作依赖新闻了解过去和现在的因果事件,以对未来做出明智的决策和预测。随着在线新闻数量的持续增长,需要实现从非结构化文本中自动提取因果事件。本文提出了一种从新闻中构建因果知识图谱的方法,包括两个步骤:(1)因果关系的提取,以及(2)参数聚类与知识图谱表示。我们旨在构建强调召回率、精确率和可解释性的图谱。在提取方面,尽管早期已有许多工作从文本中构建因果知识图谱,但大多数采用基于模式的初级方法。我们通过结合最新的基于BERT的提取模型与基于模式的方法来弥补这一差距,从而在保持高精确率的同时实现了高召回率。在聚类方面,我们采用主题建模方法对参数进行聚类,以增强图谱的连通性。结果,我们获得了1个连通图,而非15,686个不连通的子图,使用户能够从中推断出更多的因果关系。我们的最终知识图谱通过实验、多个用例和用户反馈验证,有效捕获并传达了因果关系。