Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Fine-tuning contextualised graph generation models on the LLM-generated graphs outperforms the models trained on CAEVO-generated data. Experimental results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.
翻译:从长文档中生成事件图具有挑战性,因为涉及检测事件、识别其关系以及协调非结构化输入与结构化图等多个任务的固有复杂性。现有研究通常平等看待所有事件,未能区分对于理解叙事至关重要的显著事件。本文提出了CALLMSAE,一个用于显著事件图生成的级联大语言模型框架,该框架利用大语言模型的能力,并消除了对昂贵人工标注的需求。我们首先通过提示大语言模型生成摘要来识别显著事件,并从摘要中提取显著事件。接着,我们开发了一种迭代代码优化提示策略来生成事件关系图,以消除幻觉关系并恢复缺失的边。在大语言模型生成的图上微调情境化图生成模型,其性能优于在CAEVO生成数据上训练的模型。在人工标注测试集上的实验结果表明,所提方法能够生成更显著且更准确的图,优于现有竞争基线。