Many causal claims, such as "sugar causes hyperactivity," are disputed or outdated. Yet research on causality extraction from text has almost entirely neglected counterclaims of causation. To close this gap, we conduct a thorough literature review of causality extraction, compile an extensive inventory of linguistic realizations of countercausal claims, and develop rigorous annotation guidelines that explicitly incorporate countercausal language. We also highlight how counterclaims of causation are an integral part of causal reasoning. Based on our guidelines, we construct a new dataset comprising 1028 causal claims, 952 counterclaims, and 1435 uncausal statements, achieving substantial inter-annotator agreement (Cohen's $κ= 0.74$). In our experiments, state-of-the-art models trained solely on causal claims misclassify counterclaims more than 10 times as often as models trained on our dataset.
翻译:许多因果主张(例如“糖导致多动症”)存在争议或已过时。然而,文本因果提取研究几乎完全忽视了因果关系的反主张。为填补这一空白,我们对因果提取文献进行了全面回顾,汇编了反因果主张的语言实现形式的详尽清单,并制定了明确纳入反因果语言的严格标注准则。我们还强调因果反主张是因果推理不可或缺的组成部分。基于这些准则,我们构建了一个包含1028个因果主张、952个反主张和1435个非因果陈述的新数据集,并实现了较高的标注者间一致性(Cohen's $κ= 0.74$)。实验表明,仅使用因果主张训练的最先进模型对反主张的误分类率,比使用我们数据集训练的模型高出十倍以上。