Event Causality Identification (ECI) has become a crucial task in Natural Language Processing (NLP), aimed at automatically extracting causalities from textual data. In this survey, we systematically address the foundational principles, technical frameworks, and challenges of ECI, offering a comprehensive taxonomy to categorize and clarify current research methodologies, as well as a quantitative assessment of existing models. We first establish a conceptual framework for ECI, outlining key definitions, problem formulations, and evaluation standards. Our taxonomy classifies ECI methods according to the two primary tasks of sentence-level (SECI) and document-level (DECI) event causality identification. For SECI, we examine feature pattern-based matching, deep semantic encoding, causal knowledge pre-training and prompt-based fine-tuning, and external knowledge enhancement methods. For DECI, we highlight approaches focused on event graph reasoning and prompt-based techniques to address the complexity of cross-sentence causal inference. Additionally, we analyze the strengths, limitations, and open challenges of each approach. We further conduct an extensive quantitative evaluation of various ECI methods on two benchmark datasets. Finally, we explore future research directions, highlighting promising pathways to overcome current limitations and broaden ECI applications.
翻译:事件因果关系识别(ECI)已成为自然语言处理(NLP)中的一项关键任务,旨在从文本数据中自动提取因果关系。本综述系统性地探讨了ECI的基础原理、技术框架与挑战,提出了一个全面的分类体系以归纳和厘清当前的研究方法,并对现有模型进行了量化评估。我们首先建立了ECI的概念框架,明确了关键定义、问题形式化描述及评估标准。我们的分类体系依据句子级(SECI)与文档级(DECI)事件因果关系识别这两项主要任务对ECI方法进行分类。针对SECI,我们审视了基于特征模式匹配、深度语义编码、因果知识预训练与基于提示的微调以及外部知识增强等方法。针对DECI,我们重点分析了专注于事件图推理和基于提示的技术,以应对跨句子因果推理的复杂性。此外,我们剖析了每种方法的优势、局限性与开放挑战。我们进一步在两个基准数据集上对多种ECI方法进行了广泛的量化评估。最后,我们探讨了未来的研究方向,指出了克服当前局限并拓宽ECI应用前景的潜在路径。