Similar Narrative Retrieval is a crucial task since narratives are essential for explaining and understanding events, and multiple related narratives often help to create a holistic view of the event of interest. To accurately identify semantically similar narratives, this paper proposes a novel narrative similarity metric called Facet-based Narrative Similarity (FaNS), based on the classic 5W1H facets (Who, What, When, Where, Why, and How), which are extracted by leveraging the state-of-the-art Large Language Models (LLMs). Unlike existing similarity metrics that only focus on overall lexical/semantic match, FaNS provides a more granular matching along six different facets independently and then combines them. To evaluate FaNS, we created a comprehensive dataset by collecting narratives from AllSides, a third-party news portal. Experimental results demonstrate that the FaNS metric exhibits a higher correlation (37\% higher) than traditional text similarity metrics that directly measure the lexical/semantic match between narratives, demonstrating its effectiveness in comparing the finer details between a pair of narratives.
翻译:摘要:相似叙事检索是一项关键任务,因为叙事对于解释和理解事件至关重要,而多个相关叙事通常有助于形成对关注事件的全面视角。为了准确识别语义相似的叙事,本文提出了一种基于经典5W1H面向(谁、什么、何时、何地、为什么及如何)的新型叙事相似性度量——面向化叙事相似性(FaNS),该度量通过利用最先进的大语言模型(LLMs)提取这些面向。与仅关注整体词汇/语义匹配的现有相似性度量不同,FaNS独立地对六个不同面向进行更细粒度的匹配,然后将其组合。为评估FaNS,我们通过收集第三方新闻门户AllSides的叙事创建了一个综合数据集。实验结果表明,与直接衡量叙事间词汇/语义匹配的传统文本相似性度量相比,FaNS度量表现出更高的相关性(高出37%),证明了其在比较一对叙事间细节差异方面的有效性。